您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Pandas learning problem sorting (continuous update)

編輯：Python

List of articles

Let me write it out front
1、pandas Read csv The statement
2、pandas Operations on Columns ( Additions and deletions , Addition, deletion and modification of conditions )
- 1、pandas to DataFrame Add a column of data （ Add a new field ）
- 2、pandas Delete DataFrame A column of data
- 3、pandas Delete DataFrame A column of data that contains a value
- 4、pandas modify DataFrame All the data in a column
- 5、pandas modify DataFrame Data in a column that meets specified conditions
- 6、pandas Read csv File skipping header （ Don't read the first line ）
3、pandas Operations on lines ( Additions and deletions , Addition, deletion and modification of conditions )
- 1、 Delete the line containing the specified text
- 2、 Delete duplicate row data
4、pandas New empty DataFrame operation
5、pandas obtain DataFrame Property operations for
- 1、 obtain DataFrame Row number
6、pandas Yes DataFrame Operation of date attribute value in
- 1、 Determine the data and its number within the specified date range
7、pandas Merge multiple DataFrame(csv file )
- 1、 Use merge function
- 2、 Use concat function
8、pandas take txt And csv Files convert to each other
- 1、txt Convert to csv file
- 2、csv Convert to txt file
9、pandas Change column names and change the order of fields
10、pandas Extract a row that meets the condition and write to another csv file
summary

Let me write it out front

No special study pandas, All of them are gradually learned through application , This blog is a record of my use of pandas Notes on problems encountered while processing data , For future reference , So as not to search again . This article is constantly updated , At present, the empty content is what I met but haven't had time to write , After the meeting , Problems not encountered will be improved in the future , So I didn't lie to everyone . What kind of problems do you have , Leave a comment in the comments section , Progress together ！

This article is based on the following csv File as an example ：

1、pandas Read csv The statement

''' I use WPS The saved csv For documents pycharm It will display garbled code '''
import pandas as pd
# The general reason for garbled code is csv With ANSI Code form code , One solution is to csv Open in Notepad , Save as UTF-8 that will do 
# The second way is to use pandas With ANSI open , And then UTF-8 Save it , Then you can view it normally （ Be careful not to read the code again ANSI 了 ） 
df=pd.read_csv(filePath,encoding="ANSI")
df.to_csv(filePath, encoding="UTF-8",index=False)
# If there are many files to read , You can use glob Batch modify coding format 
import glob
for file in glob.glob(FilePath): # there FilePath form ： For example, the files are all named data In the folder of , that FilePath Namely "data\\*.csv",*.csv It means any name csv file 
df=pd.read_csv(file,encoding="ANSI")
df.to_csv(file, encoding="UTF-8",index=False)#index=False Don't use quotation marks

Example ： Use it directly

df=pd.read_csv("StudentList.csv")

The result is wrong ：

With ANSI Read ：

2、pandas Operations on Columns ( Additions and deletions , Addition, deletion and modification of conditions )

1、pandas to DataFrame Add a column of data （ Add a new field ）

df=pd.read_csv(file,encoding="UTF-8")
# We use a list to store the new data , Note that the length of the list should be the same as the current df The number of rows is equal , Otherwise, the error report will not be added 
nList=[1,2,3,4,5,6]# Hypothetical origin df common 6 That's ok , Not counting the field line 
# The new field name is "newField"
df["newField"]=nList
# preservation 
df.to_csv(file, encoding="UTF-8",index=False)

Example ： Add a new column “ achievement ”：

2、pandas Delete DataFrame A column of data

df=pd.read_csv(file,encoding="UTF-8")
# Suppose the deletion is named "name" The column of 
del df["name"]
# preservation 
df.to_csv(file, encoding="UTF-8",index=False)

Example ： Delete “ID”：

3、pandas Delete DataFrame A column of data that contains a value

4、pandas modify DataFrame All the data in a column

5、pandas modify DataFrame Data in a column that meets specified conditions

6、pandas Read csv File skipping header （ Don't read the first line ）

df = pd.read_csv(filePath,header=None)

3、pandas Operations on lines ( Additions and deletions , Addition, deletion and modification of conditions )

1、 Delete the line containing the specified text

def dropText():
df = pd.read_csv(filePath)
# With FieldName Subject to , If it contains ：yourText Text delete this line 
df = df.drop(df[df["FieldName"].str.contains("yourtext")].index)
df.to_csv(filePath, index=False, encoding="UTF-8")

2、 Delete duplicate row data

def dropDuplicate():
df = pd.read_csv(filePath)
# With FieldName Subject to , Remove its duplicate data , Keep the last line last
df.drop_duplicates('FieldName', keep='last', inplace=True)
df.to_csv(filePath, index=False, encoding="UTF-8")

4、pandas New empty DataFrame operation

5、pandas obtain DataFrame Property operations for

1、 obtain DataFrame Row number

 df = pd.read_csv(filePath, encoding="UTF-8")
print(len(df))

6、pandas Yes DataFrame Operation of date attribute value in

1、 Determine the data and its number within the specified date range

7、pandas Merge multiple DataFrame(csv file )

1、 Use merge function

2、 Use concat function

# Merge effect ： Columns with the same field name are appended , Add... In the vertical direction , The resulting new file contains all csv All column names of the file . If the field names or numbers of the files to be merged do not correspond , Then the value without corresponding field is null 
df1=pd.read_csv(FilePath_1)
df2 = pd.read_csv(FilePath_2)
df3 = pd.read_csv(FilePath_3)
df=pd.concat([df1,df2,df3]) # More lists and so on 
df.to_csv(FilePath, encoding='utf-8', index=False)

8、pandas take txt And csv Files convert to each other

1、txt Convert to csv file

df = pd.read_csv(TxtFilePath,delimiter="\t",header=None)#txt Without the field name 
df.columns = ['id','name','age'] # List of field names , Here we give 3 Field names 
df.to_csv(CsvFilePath, encoding='utf-8', index=False)