程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Pandas common operations

編輯:Python

pandas Common operations

First : Generate an experiment data table

import pandas as pd
df = pd.DataFrame({
'a':[1,2,3],'b':[1,2,3]})
a b
0 1 1
1 2 2
2 3 3

1、 Modify header (columns)

1、 Modify all column names

Such as :a,b It is amended as follows A、B.

df.columns = ['A','B']
print(df)

result :

 A B
0 1 1
1 2 2
2 3 3

2、 Only the specified column name is modified

Such as :a It is amended as follows A

df.rename(columns={
'a':'A'},inplace=True)
print(df)

result

A b
0 1 1
1 2 2
2 3 3

other

df: Any of the Pandas DataFrame object
s: Any of the Pandas Series object
raw: Row labels
col: Column labels

Import dependency package :

import pandas as pd
import numpy as np

1、 Import data

pd.read_csv(filename_path): from CSV File import data
pd.read_table(filename_path): Import data from a delimited text file
pd.read_excel(filename_path): from Excel File import data
pd.read_sql(query, connection_object): from SQL surface / Library import data
pd.read_json(json_string): from JSON Import data in string format
pd.read_html(url): analysis URL、 String or HTML file , Take out the tables form
pd.read_clipboard(): Get content from your pasteboard , And to the read_table()
pd.DataFrame(dict): Import data from dictionary objects ,Key Is the column name ,Value Is the data

2、 Derived data

df.to_csv(filename_path): Export data to CSV file
df.to_excel(filename_path): Export data to Excel file
df.to_sql(table_name, connection_object): Export data to SQL surface
df.to_json(filename_path): With Json Format export data to text file

3、 Create test data

pd.DataFrame(np.random.rand(20,5)): establish 20 That's ok 5 Columns of random numbers DataFrame object
pd.Series(my_list): From iteratable objects my_list Create a Series object
df.index = pd.date_range('1900/1/30', periods=df.shape[0]): Add a date index

4、 see 、 Check the data

df.head(n): see DataFrame The front of the object n That's ok ( Without parameters , Before default 10 That's ok )
df.tail(n): see DataFrame The end of the object n That's ok ( Without parameters , After default 10 That's ok )
df.shape(): Look at the number of rows and columns ( Dimension view )
df.info(): Look at the index 、 Data types and memory information
df.describe(): View summary statistics for numeric Columns
s.value_counts(dropna=False): see Series Object's unique value and count
df.apply(pd.Series.value_counts): see DataFrame The unique value and count of each column in the object
df.dtypes: View the data type of each column ( Expand :df['two'].dtypes, see “two” The type of the column )
df.isnull(): View vacant ( notes : The vacant part will use true Show , Not vacant False Show )( Expand :df['two'].isnull, see “two” The vacancy in this column )
df.values: Look at the values in the data table
df.columns: View column name

5、 Data selection

( For specific use, see https://www.cnblogs.com/luckyplj/p/13274662.html)

df.isin([5]): Judge whether there is... In all data values 5
df[col].isin([5]): Judgment column col If there 5
df[col]: By column name , And Series Returns a column in the form of
df[[col1, col2]]: With DataFrame Form returns multiple columns
s.iloc[0]: Select row data by position
s.loc['index_one']: Select row data by index
df.loc[:,'reviews'] Get the data for the specified column Be careful : The first parameter is zero : Represents all lines , The first 2 Parameters are column names , Set the get column name to review The data of
df.loc[[0,2],['customername','reviews','review_fenci']] Select the specified multiple rows and columns Parameter description : [0,2] This list has two elements 0,2 Choose the second 0 Xing He 2 That's ok ['customername','reviews','review_fenci'] This list has 3 Elements indicate that the selected column name is 'customername','reviews','review_fenci‘ the 3 Column
df.iloc[0,:]: Go back to the first line
df.iloc[0,0]: Returns the first element of the first row
df.ix[0] or df.ix[raw] :ix Function can select row data according to row position or row label

notes :loc Function according to line / Column labels ( User defined row name 、 Name ) Make row selection ;

 iloc Function according to line / Column position ( The default row and column index ) Make row selection ;

6、 Data cleaning

df.columns = ['a','b','c']: Rename column name
pd.isnull(): Check DataFrame Null value in object , And return a Boolean Array
pd.notnull(): Check DataFrame Non null value in object , And return a Boolean Array
df.dropna(): Delete all rows with null values
df.dropna(axis=1): Delete all columns with null values
df.dropna(axis=1,thresh=n): Delete all less than n A non null row
df.fillna(x): use x Replace DataFrame All null values in the object ( notes :fillna() Will fill in nan data , Returns the filled result . If you want to be in the original DataFrame Revision in China , Then put inplace Set to True. Such as ,df.fillna(0,inplace=True))
mydf[' Name ']=mydf[' Name '].fillna(0) The null value of a column is filled with zero
s.astype(float): take Series Change the data type in to float type
df[col].astype(float): take DataFrame The data type of a column is changed to float type
s.replace(1,'first'): use ‘first’ Instead of all equal to 1 Value ( Instead, the value , Not a column name nor an index name )
s.replace([1,3],['one','three']): use 'one' Instead of 1, use 'three' Instead of 3
df[col].replace(1,1.0,inplace=True): Column col The value in 1 use 1.0 Replace
df.replace([1,3],['one','three'])
df.rename(columns=lambda x: x + 1): Batch change column names
df.rename(columns={'old_name': 'new_ name'}): Selectively change column names
df.set_index('column_one'): take column_one This column becomes an index column
df.rename(index=lambda x: x + 1): Batch rename index
df[col]=df[col].str.upper() or df[col].str.lower(): Column based case conversion
df[col]=df[col].map(str.strip): Clear a column of spaces
df.drop_duplicates(subset=col,keep='fisrt',inplace=Flase): Delete duplicate values

notes : This drop_duplicate The way is right DataFrame Formatted data , Remove duplicate rows below specific columns . return DataFrame Formatted data .

subset : column label or sequence of labels, optional Used to specify a specific column , Default all columns
keep : {‘first’, ‘last’, False}, default ‘first’ Remove duplicate items and keep the first occurrence
inplace : boolean, default False Whether to modify the original data directly or keep a copy

7、 Data processing

df[df[col] > 0.5]: choice col The value of the column is greater than 0.5 The line of
df.sort_values(col1): According to the column col1 Sorting data , Default ascending order
df.sort_values(col2, ascending=False): According to the column col1 Sort the data in descending order
df.sort_values([col1,col2], ascending=[True,False]): First by column col1 Ascending order , Post press col2 Sort the data in descending order
df.groupby(col): Returns a by column col In groups Groupby object
df.groupby([col1,col2]): Returns a list grouped by multiple columns Groupby object
df.groupby(col1)[col2]: Return by column col1 After grouping , Column col2 The average of
df.pivot_table(index=col1, values=[col2,col3], aggfunc=max): Create a by column col1 Grouping , And calculate col2 and col3 The maximum value of the PivotTable
df.groupby(col1).agg(np.mean): Return by column col1 The mean of all the columns in the group
data.apply(np.mean): Yes DataFrame Each column in the application function np.mean
data.apply(np.max,axis=1): Yes DataFrame Apply the function... To each line in the np.max
df.isin

8、 Data merging

df1.append(df2): take df2 Add rows from to df1 Tail of
df.concat([df1, df2],axis=1): take df2 Add columns in to df1 Tail of
df1.join(df2,on=col1,how='inner'): Yes df1 And df2 The column execution of SQL Formal join

9、 Data statistics

df.describe(): View the summary statistics of the data value column
df.mean(): Returns the mean of all columns
df.corr(): Returns the correlation coefficient between columns
df.count(): Return non null values in each column (NaN) The number of
df.max(): Returns the maximum value of each column
df.min(): Returns the minimum value of each column
df.median(): Returns the median of each column
df.std(): Returns the standard deviation of each column
df.sum(): Returns the sum of all rows

Reference resources


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved