您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Common functions of pandas Library: select rows and columns, and remove duplicates

編輯：Python

Pandas Library common functions

One 、 Select rows and columns
- 1.1 Create sample data
- 1.1 Select the row that meets the condition according to the condition
- 1.2 Use loc Make a selection
- - 1.2.1 Select row
  - 1.2.2 Select column
- 1.3 Use iloc Make a selection
- - 1.3.1 Select row
  - 1.3.2 Select column
- 1.4 Use it directly column Select column
- 1.5 Select a piece of data
Two 、 duplicate removal

One 、 Select rows and columns

1.1 Create sample data

import pandas as pd
import numpy as np
value = np.array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]])
index = np.array([10, 20, 30])
column = np.array(['u', 'i', 'r', 'time'])
df = pd.DataFrame(data=value, index=index, columns=column)
''' u i r time 10 1 2 3 4 20 2 3 4 5 30 3 4 5 6 '''

Mainly involves loc and iloc Usage of , The old version of python There's also ix, however ix Has been deprecated by the new edition

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

It's easy to see ,loc Index by attribute name , and iloc Is indexed by column number , Such as df.loc[:, ‘u’] and df.iloc[:, 0] It implements the same function ~

1.1 Select the row that meets the condition according to the condition

# Filter through a Boolean decision , select time Column greater than 4 The line of 
''' u i r time 20 2 3 4 5 30 3 4 5 6 '''
data = df[df.time > 4]
# You can also use to set multiple conditions , If the requirements are added, the score will be 5 The conditions of division 
''' u i r time 30 3 4 5 6 '''
data = df[(df.time > 4) & (df.r == 5)]

1.2 Use loc Make a selection

1.2.1 Select row

# Choose the first line （ First line index yes 10）
''' u 1 i 2 r 3 time 4 Name: 10, dtype: int32 '''
data = df.loc[10]
# Select the second and subsequent rows （ Second elements index yes 20）
''' u i r time 20 2 3 4 5 30 3 4 5 6 '''
data = df.loc[20:30]
# perhaps data = df.loc[20:]

1.2.2 Select column

# Choose the first column （ In the first column column yes 'u'）
''' u 1 i 2 r 3 time 4 Name: 10, dtype: int32 '''
data = df.loc[:, 'u']
# Select the second to fourth columns （ The second column column yes 'i', In the fourth column column yes ‘time’）
''' i r time 10 2 3 4 20 3 4 5 30 4 5 6 '''
data = df.loc[:, 'i':'time']

1.3 Use iloc Make a selection

1.3.1 Select row

# Choose the first line 
''' u 1 i 2 r 3 time 4 Name: 10, dtype: int32 '''
data = df.iloc[0]
# Select the second and subsequent rows 
''' u i r time 20 2 3 4 5 30 3 4 5 6 '''
data = df.iloc[1:3]
# perhaps data = df.iloc[1:]

1.3.2 Select column

# Choose the first column 
''' u 1 i 2 r 3 time 4 Name: 10, dtype: int32 '''
data = df.iloc[:, 0]
# Select the second to fourth columns 
''' i r time 10 2 3 4 20 3 4 5 30 4 5 6 '''
data = df.iloc[:, 1:4]

1.4 Use it directly column Select column

# Choose the first column （ In the first column column yes 'u'）
''' u 1 i 2 r 3 time 4 Name: 10, dtype: int32 '''
data = df['u']

1.5 Select a piece of data

# Use loc
''' u i r 10 1 2 3 20 2 3 4 '''
data = df.loc[10:20, 'u':'r']
# Use iloc
''' u i r 10 1 2 3 20 2 3 4 '''
data = df.iloc[0:2, 0:3]

Two 、 duplicate removal

import pandas as pd
# There are four columns of data , The names are u,i,r and time
#u i r time
#1 3 4 1
#2 1 5 2
#3 1 5 3
#1 3 4 2
#1 3 4 1
df = pd.read_csv('rating.txt', names=['u', 'i', 'r', 'time'])

Obviously , The first data is the same as the fifth data , Data sets need to be de duplicated , The main use is drop_duplicates()

# u i r time
#0 1 3 4 1
#1 2 1 5 2
#2 3 1 5 3
#3 1 3 4 2
data = df.drop_duplicates()

But if you don't think about time , We will find the first 、 Four 、 Five pieces of data are repeated , You can combine the above method of selecting columns to remove duplicates

# u i r
#0 1 3 4
#1 2 1 5
#2 3 1 5
data = df.loc[:, ['u', 'i', 'r']].drop_duplicates()

上一篇文章： Some tips in using pyhon: pip is slow, python2 and 3 coexist, and checking memory and CPU usage
下一篇文章： Python desktop program development wxPython, pyinstaller

Python

如何在Python中引用其他模塊

目錄一、前言二、導入和使用標准模塊三、第三方模塊的下載與安裝

請教可以用Python畫出這種圖嗎

想用python畫下面圖的效果，即不同位置與圓之間的差值，目

Python打包成exe文件

前言對於python為什麼要打包成exe文件，是因為傳輸源文

Python -- collections and dictionaries

One 、 aggregate （ A mountain c

[Python basics 2022 latest] lesson 1 installation & environment configuration

【Python Basics 2022 newest 】 T

Why is the content set by the SetLable method not automatically centered after the interface is written in wxpython?

After writing the interface wi

The problem of sorted and reversed in Python

The use of str() and repr() methods in Python

How to add the same character to each element of Python list

Pandas custom change the order of columns in dataframe

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

pandas自定義改變dataframe數據列的前後次序 (change the order of columns in dataframe)

Leetcode solution (1672): total assets of the richest customers (Python)

Python and fractal 0019 - [tutorial] stack of circles

python與分形0019 - 【教程】Stack of Circles

leetcode 2305. Fair Distribution of Cookies（python）

熱門圖文

Delphi 動態鏈接庫編程網頁浏覽次數統計jsp代碼及jsp總結 Java教程——CMD手動編譯運行失敗原因（高手略過），java失敗原因 javascript-js查找一個未聲明變量時，如a，與window.a的區別，為啥window.a不報錯 PHP配置心得包含MYSQL5亂碼解決 PHP啟動提示Unable to load dynamic library php_curl.dll怎麼辦 PHP驗證碼生成程序幾種方法 linux mysql windows-在linux上如何連接windows上的mysql，有沒有可視化圖形軟件？

欄目導航