您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[pandas] data selection

編輯：Python

Pandas Common data selection operations

operation

grammar Select column df[' Name '] or df. Name Use slice to select rows df[5:10] Select rows by index df.loc[label] Select rows by numeric index df.iloc[loc] Filter rows with expressions

df[bool_vec]

Data selection , That is to filter the data according to certain conditions , adopt Pandas The method provided can simulate Excel Filter data , It can flexibly meet the query requirements of various data

1. Select column

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
['Arry','C',36,37,37,57],
['Ack','A',57,60,18,84],
['Eorge','C',93,96,71,78],
['Oah','D',65,49,61,86]
],
columns = ['name','team','Q1','Q2','Q3','Q4'])
# The following two methods return 'name' Columns of data , The data type obtained is Series
res1 = df['name']
res2 = df.name
res3 = df.Q1
type(res3) # pandas.core.series.Series
#--------------------------------------------------------------------
print(res1 == res2)
# Output results :
# 0 True
# 1 True
# 2 True
# 3 True
# 4 True
# Name: name, dtype: bool
#--------------------------------------------------------------------

res1

res2

res3

2. Use slice to select rows ( section [])

We can use slicing like a list Select some rows The data of ( Index value from 0 Start ), However, it is not supported to index only one piece of data , It should be noted that ,Pandas Use sliced logic and Python The logic of the list is the same , The index value on the right is not included

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
['Arry','C',36,37,37,57],
['Ack','A',57,60,18,84],
['Eorge','C',93,96,71,78],
['Oah','D',65,49,61,86]
],
columns = ['name','team','Q1','Q2','Q3','Q4'])
# The first two lines of data
res1 = df[:2]
# The last two lines of data
res2 = df[3:]
# All the data ( Not recommended )
res3 = df[:]
# Take... In steps
res4 = df[:5:2]
# Inversion order
res5 = df[::-1]
# Report errors
res6 = df[2]

res1

res2

res3

res4

res5

If there is a list of column names in the slice ( form : df[[' Name ',' Name ',...]]), You can filter out multiple columns of data

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
['Arry','C',36,37,37,57],
['Ack','A',57,60,18,84],
['Eorge','C',93,96,71,78],
['Oah','D',65,49,61,86]
],
columns = ['name','team','Q1','Q2','Q3','Q4'])
# Screening 'name','team' Two column data
res = df[['name','team']]
# The difference is , If there is only one column ( Format : df[[' Name ']]), It would be a DataFrame:
res1 = df[['name']]
type(res1) # pandas.core.frame.DataFrame
res2 = df['name']
type(res2) # pandas.core.series.Series

res

res1

res2

3. Select rows by index ( By axis label .loc)

loc: works on labels in the index

By axis label .lochttps://blog.csdn.net/Hudas/article/details/123096447?spm=1001.2014.3001.5502

4. Select rows by numeric index ( Index by number .iloc)

iloc: works on the positions in the index(so it only takes integers)

Index by number .ilochttps://blog.csdn.net/Hudas/article/details/123096447?spm=1001.2014.3001.5502

5. Filter rows with expressions

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
['Arry','C',36,37,37,57],
['Ack','A',57,60,18,84],
['Eorge','C',93,96,71,78],
['Oah','D',65,49,61,86]
],
columns = ['name','team','Q1','Q2','Q3','Q4'])
# Q1 be equal to 36
res1 = df[df['Q1'] == 36]
# Q1 It's not equal to 36
res2 = df[~(df['Q1'] == 36)]
# The name is 'Eorge'
res3 = df[df['name'] == 'Eorge']
# Screening Q1 Greater than Q2 The line record of
res4 = df[df.Q1 > df.Q2]