您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

pandas - common usage of data selection

編輯：Python

使用pandas時,Often on a certain line、某列、Statistical calculations are performed on data that meet the conditions.
以下總結了pandasCommon methods of data selection,包括loc、iloc等方法的使用.
首先讀取數據：

df = pd.read_excel('zpxx.xlsx')

1、元素、索引、列名獲取

可以利用DataFrame的基礎屬性values、index、columns,Get the elements individually、索引、列名

print('獲取元素：\n', df.values) # 返回二維列表

print('獲取索引：\n', df.index) # Returns the index of the row,可使用list轉換為列表格式

print('獲取列名：\n', df.columns) # 返回字段名,可使用list轉換為列表格式

2、行的選取

（1）head()和tail()方法

DataFrame提供的head()和tail()方法,Multiple rows of data can be obtained,Get continuous data from the beginning or end,Default is front or rear5行數據;The number of access lines can be entered in the method,To achieve the viewing of the target number of rows.
默認情況下：

print('前5行（默認）數據：\n', df.head())
print('後5行（默認）數據：\n', df.tail())

指定查看行數：

print('指定查看前3行數據：\n', df.head(3))

Specifies the target number of rows to view for a field：

print('指定查看【關鍵詞】字段的前3行數據：\n', df['關鍵詞'].head(3))

（2）切片方式

格式：df[m:n],m、nRepresents the specified number of rows,左閉右開

print('查看第2-第6行數據：\n', df[1:6])

3、列的選取
（1）Access a certain one as a dictionarykey的值的方式
選取某一列：df[‘列名’]
選取多列：df[[‘列名1’,’列名2’,’列名3’]]

選取某一列：

print('選取【采集時間】列：\n', df['采集時間'])

選取多列：

print('選取多列：\n', df[['關鍵詞', '采集時間']])

（2）Access property by way of access
用法：df.列名
最好不用,Confusion can easily arise between field names and internal fixed method names.

print('選取【采集時間】列：\n', df.采集時間)

4、loc和iloc行列選擇
（1）loc用法
語法：df.loc[行索引名稱或條件,列索引名稱]
loc是針對DataFrameSlicing of index names,What must be passed in is the index name,否則不能執行;And the row index cannot be empty,否則將失去意義.
第一種用法,Both row and column indexes are present：

print('選取【采集時間】整列數據：\n', df.loc[:, '采集時間']) # loc用法

print('選取前5行的【采集時間】：\n', df.loc[:4, '采集時間']) # loc用法

注：If the row index is an interval,The front and back are closed intervals.上面“:4”Represents the row index[0:4],都為閉區間.

print('選取第3行的【采集時間】：\n', df.loc[2, '采集時間']) # loc用法

第二種,only row labels：
注：If the row index is an interval,The front and back are closed intervals.

print('選取第一行', df.loc[0])

print('選取第2行,第4行：\n', df.loc[[0, 3]])

print('選取前3行：\n', df.loc[0:2])

第三種,傳入條件：

print('選取【學歷】are undergraduate data：\n', df.loc[df['學歷'] == '本科', ['學歷', '所在地']])

（2）iloc用法
語法：df.iloc[行索引位置,列索引位置]
iloc與loc不同的是,ilocData is selected based on location.Only integer data is accepted,如df.iloc[1]、df.iloc[1,2]、df[:4,3]、df[1,[1,2,5]]

print('選取【關鍵詞】字段的前4行數據：\n', df.iloc[:4, 0]) # iloc用法

注：這裡的“:4”Indicates the position of the row[0,4）,從0開始,左閉右開;“0”則表示【關鍵詞】Fields are in the first position.
總體來看,loc使用更為靈活,代碼可讀性更高.

5、ix數據選擇

ixThe method can receive both the index name when used,An index position can also be received.
語法：df.ix[行索引的名稱或位置或條件,列索引名稱或位置]
注意：When there is a partial overlap of index names and locations,ixThe default priority is to identify the name.
ix方法在pandas 1.0.0之後,已經移除,用loc和iloc方法進行替換.

以上就是pandasCommon uses for data selection.