Hello everyone , Here is @ Drink less cold water G!
If you haven't seen the previous issue, you can take a look at the first issue first ~
Pandas Data analysis tutorial (1)-Series and DataFrame
In the last issue , We've done a little bit of that Series and DataFrame Two kinds of Pandas Data structure commonly used in , So here comes the question , Suppose I already have these two kinds of data , How to extract the part I want ?
This problem is important for subsequent reads Excel File manipulation is critical , After all Excel The contents of the document are similar to DataFrame It's like , I recorded this section Pandas in Read
、 choice
、 Filter
Two methods of data structure .
DataFrame Type in a way , It's just better than Series One more dimension , Just fill in one less parameter when reading , The reading method is essentially the same , Therefore, this paper mainly records DataFrame The reading method of .
because Pandas Based on the Numpy Based on ,Pandas Index and Numpy The function of array index is similar to . however , recall ,Numpy Index of array must be integer , and Pandas The index of can be a string , This is the main difference .
This paper records two main methods : Common indexing and loc/iloc Selection method . Now establish a DataFrame As an example of this article :
>> table = pd.DataFrame(np.arange(16).reshape(4,4),
range(1,5),['A','B','C','D'])
>> table
A B C D
1 0 1 2 3
2 4 5 6 7
3 8 9 10 11
4 12 13 14 15
The common index is mainly used to select Specify a column or column sequence The data of , In addition, you can also use the Boolean array
perhaps section
Select the specified row :
table['C'] # selection 'C' That's ok , Return to one Series data
table[['C','D']] # selection 'C' Row sum 'D' That's ok , Return to one DataFrame data
use section perhaps Boolean array You can select the specified row :
table[1:4] # Select the first 2 Go to the first place 4 That's ok , Note: press from 0 Number of index counts started , Instead of our own decision 1-4, Left closed right away
table[table['C']>7] # selection C Column greater than 7 The line of
Be careful :Series When slicing, it can have the following forms :obj [‘b’:‘d’], Intercept row b Go to line d Between the lines , Include b、d( Assume that the index is from ’a’ To ’d’), That is to say Left and right closed .
loc and iloc yes pandas Two functions for indexing , Compared with ordinary index , This method is more widely used .loc The method is called Axis label index
,iloc The method is called Integer index
.
First of all to see iloc Method :
>> table.iloc[[2,3],[3,0,1]]
D A B
3 11 8 9
4 15 12 13
This code represents the selection of 3 That's ok ( Subscript to be 2) And the 4 That's ok ( Subscript to be 3) Corresponding to the first 4 Column 、 The first 1 Column 、 The first 2 Part of the column , Note that the order of output data is based on the given index data .
loc Method And iloc The functions of methods are consistent , The difference lies in ,loc The index given by the method may not be an integer , By default, the parameters filled in by the user are defined tags , Not from 0 → {\rightarrow} →N-1 The subscript . For example, in this case , The column label is [‘A’,‘B’,‘C’,‘D’], Then you can take the following example :
>> table.loc[[2,3],['D','A','B']]
D A B
2 7 4 5
3 11 8 9
This time, the module recognizes the index as a user-defined index , No more 0→N-1 The subscript of
Of course, the two methods can also be used for slicing :
table.loc[1:3,:'C']
table.iloc[:,3:]
These operations are well understood . meanwhile , We can also pass in Boolean arrays , Conduct Condition screening , Make it a little more complicated :
>> table.loc[2:4,['A','C','D']][(table.A>4) & (table.C>4)] # Think about the meaning of the sentence
A C D
3 8 10 11
4 12 14 15
There are other indexing methods , for example at/iat Indexes 、_get_value Method 、ix Method ( Be careful ! Latest version deprecated ) wait , But on the whole , The functions of these methods loc/iloc Index can do , but loc/iloc Index can do these methods can not do , So they are usually used very little .at/iat Indexes 、_get_value Methods are used to get the value of a cell , among at and iat The difference between loc and iloc The difference is the same .
The next time ? I don't know what to write next time , The writing content is random [ dog's head .jpg].
Reprint Please mark the author and source link