程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Chapter 2: Advanced python of artificial intelligence - pandas Library

編輯:Python
  • numpy and pandas difference
numpy: Process numerical data
pandas: character string , Time data, etc

1、Pandas summary

  • pandas Name source : Panel data (panel data)

Pandas Is a powerful tool set for analyzing structured data , be based on Numpy structure , Provides Advanced data structures and data operations Work

 1、 The basis is numpy, It provides efficient operation of performance matrix ;
2、 Provide data cleaning function
3、 Applied to data mining , Data analysis
4、 It provides a large number of functions and methods that can process data quickly and conveniently

2、Pandas data structure

2.1、Series Introduce

Series: Is a one-dimensional labeled data type object , Can save any data type (int,str,float,python object), Contains data labels , Known as the index

  • Objects similar to one-dimensional arrays 1,index=[“ name ”,“ Age ”,“ class ”]
  • It consists of data and index
    • The index is on the left (index), The data is on the right (values)
    • The index is automatically created

2.2、Series establish

(1) Create... From a list

  • Example
# 1、 adopt list establish 
s1 = pd.Series([1,2,3,4,5])
s1
  • Query results

(2) Create... From an array

  • Example
# 2、 Create... From an array 
arr1= np.arange(1,6)
s2 = pd.Series(arr1)
print(s2)
  • Query results

​ (3) Create... From a dictionary

  • Example
# 3、 Create... From a dictionary 
dict = {
'name':' Lining ','age':18,'class':' Class three '}
s3 = pd.Series(dict,index = ['name','age','class','sex'])
s3
  • Query results

3、Series usage

(1) A null value judgment

  • Example
# isnull and not null Detect missing values 
s3.isnull()
  • Query results

(2) get data

 How to get data : Indexes , Subscript , Tag name
  • Example
# 1、 Index get data 
print(s3.index)
print(s3.values)
# 2、 Subscript get data 
print(s3[1:3])
# 3、 Tag name get data 
print(s3['age':'class'])
  • Query results

  • matters needing attention
 The difference between label slice and subscript slice
Label slice : Contains end data
Index slice : Does not contain end data

(3) The correspondence between index and data

 The correspondence between the index and the data is not affected by the operation results

(4)name attribute

  • Example
s3.name = "temp" # Object name 
s3.index.name = 'values' # Object index name 
s3
  • Query results

3、DateFrame data structure

3.1、DataFrame summary

DataFrame It's a Tabular form Data structure of , It has an ordered set of columns , Each column can be a different type of index value ,DataFrame There are both row and column indexes , It can be seen as made up of series A dictionary made up of , Data is stored in a two-dimensional structure

  • similar Multidimensional arrays / Tabular data ( Such as ,excel,R Medium DataFrame)
  • Each column of data can be Different data types
  • The index contains Column index and Row index

3.2、DataFrame establish

  • Example
# Array 、 A dictionary constructed of lists or tuples DataFrame
data = {
'a':[1,2,3,4],
'b':(5,6,7,8),
'c':np.arange(9,13)}
frame = pd.DataFrame(data)
# Related properties 
print(frame.index)
print(frame.columns)
print(frame.values)
  • Query results

4、 Index related operations

4.1、 Overview of index objects

1、Series and DataFrame All the indexes in are index object
2、 The index object cannot be changed , Ensure data security
  • Example
ps = pd.Series(range(5))
pd = pd.DataFrame(np.arange(9).reshape(3,3),index = ['a','b','c'],columns = ['A','B','C'])
type(ps.index)
  • Running results

  • matters needing attention
 Common index types
:1、Index - Indexes
:2、Inet64index - Integer index
:3、MultiIndex - Hierarchical index
:4、DatetiemIndex - Time stamp index

4.2、 Index basic operations

(1) Re index

reindex: Reorder the index , Create a new object that matches the new index
  • Example
s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(s)
s.reindex(['e','b','f','d'])
  • Running results

(2) increase

1、 Add data to the original data structure
2、 Add data to the new data structure
  • Example - series Add data
s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
s['f'] = 100
print(s)
  • Query results

  • Example - DF Add column
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(1,10).reshape(3,3),index=['a','b','c'],columns = ['A1','B1','C1'])
print(df)
print("======")
df['D1'] = np.arange(100,103)
df2 = df
print(df2)
  • Query results

  • Example - DF Add rows
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(1,10).reshape(3,3),index=['a','b','c'],columns = ['A1','B1','C1'])
print(df)
print("======")
df.loc['d'] = np.arange(100,103)
df2 = df
print(df2)
  • Query results

(3) Delete

1、del: Delete , Will change the original structure
2、drop: Delete data on axis , Create new objects
  • Example - Series data
# Delete 
ps = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
del ps['e']
print(ps)
ps2 = ps.drop(['a','b'])
print(ps2)
  • Query results

  • Example - DF data
import numpy as np
import pandas as pd
pd = pd.DataFrame(np.random.randn(9).reshape(3,3),columns=['a','b','c'])
print(pd)
# Delete column 
pd1 = pd.drop(['c'],axis=1)
print(pd1)
# Delete row 
pd2 = pd.drop(2)
print(pd2)
  • Query results

(4) Change

1、 Modify the column : object . Indexes , object . Column
2、 Modify the line : Tag Index loc
  • Example
import numpy as np
import pandas as pd
pd = pd.DataFrame(np.random.randn(9).reshape(3,3),columns=['a','b','c'])
print(pd)
# Modify the column 
pd['a'] = 12
pd.b = 22
print(pd)
# Modify the line 
pd.loc[0] = 100
print(pd)
  • Running results

(5) check

1、 Row index
2、 Slice indices : Position slice , Label slice
3、 Discontinuous index
  • Example
import numpy as np
import pandas as pd
ps = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
# Row index 
print(ps['a'])
# Location slice index 
print(ps[1:3])
# Label slice index , Include termination index 
print(ps['a':'c'])
# Discontinuous index 
print(ps[['a','c']])
# Boolean index 
print(ps[ps>0])
  • Running results

4.3、 Advanced index

1、loc Tag Index : Index based on tag name pd.loc[2:3,'a']
2、iloc Location index : Index based on index number
3、ix Label and location mixed index : Just know
  • Example
import numpy as np
import pandas as pd
pd = pd.DataFrame(np.random.randn(9).reshape(3,3),index = [7,8,9],columns=['a','b','c'])
# Tag Index - The first parameter indexes the row , The second parameter is the column 
print(pd.loc[7:8,'a'])
# Location index - Two parameters , The ranks of 
print(pd.iloc[0:2,0:2])
  • Query results

5、Pandas operation

5.1、 Arithmetic operations

  • matters needing attention :Pandas When performing data operations , One to one correspondence will be made according to the index , After corresponding, perform corresponding arithmetic operation **, If there is no alignment, it will be used NaN Fill in .**

  • Example
import numpy as np
import pandas as pd
s1 = pd.Series(np.arange(5),index=['a','b','c','d','e'])
s2 = pd.Series(np.arange(5,10),index=['a','b','c','d','e'])
print(s1)
print(s2)
print(s1+s2)
  • Running results

5.3、 Mixed operations

DataFrame and Series Mixed operations :Series The row index of matches DataFrame Column index for broadcast operation ,index Attributes can be computed along columns
  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3),index = ['A','B','C'],columns=['A','B','C'])
ds = df.iloc[0]
# Row operation , Column broadcast 
print(df-ds)
# Column operation , Line broadcast 
df.sub(ds,axis = 'index')
  • Running results

  • matters needing attention
 Operational rules : Index matching operation

5.4、 Function application

(1)apply function

apply: Apply functions to rows or columns
  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3),index = ['A','B','C'],columns=['A','B','C'])
f = lambda x:x.max()
# Apply on line , Perform column operations 
print(df.apply(f))
# Apply to columns , Perform line operations 
print(df.apply(f,axis=1))
  • Running results

(2)applymap function

applymap: Apply the function to each data
  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3),index = ['A','B','C'],columns=['A','B','C'])
f = lambda x:x**2
print(df.applymap(f))
  • Running results

(3) Sort

 Index sort :sort_index(ascending,axis)
Sort by value :sort_values(by,sacending,axis)
  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3),index = ['B','D','C'],columns=['A','C','B'])
# Sort by row index 
print(df.sort_index(ascending=False,axis=1))
# Sort by column value 
print(df.sort_values(by = 'A'))
  • Query results

(4) Unique values and member properties

The name of the function describe unique Return to one Series, Used to remove heavy value_counts return Series, Include elements and their number isin Judge whether it exists , Returns a Boolean type

(5) Handling missing values

  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame([np.random.randn(3),[1,2,np.nan],[np.nan,4,np.nan]])
# 1、 Determine if there are missing values 
print(df.isnull())
# 2、 Discard missing data , The default discards rows 
print(df.dropna())
# 3、 Fill in missing data 
print(df.fillna(-100))
  • Running results

6、 Hierarchical index

  • Hierarchical index **: In the input index Index when , The input is made up of two subunits list Composed of list, The first one list It's the outer index , the second list It's the inner index .**

  • effect : Use the hierarchical index with the primary index of different levels , High dimensional arrays can be converted to Series or DataFrame Opposite form

  • Example

import pandas as pd
import numpy as np
ser_obj = pd.Series(np.random.randn(12),index=[
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'],
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]
])
# Select a subset 
''' Get data from index . Because now there are two layers of indexes , When getting data through the outer index , You can directly use the tag of the outer index to get . When you want to get data through the inner index , stay list Pass in two elements , The former refers to the outer index to be selected , The latter represents the inner index to be selected . '''
print(ser_obj['a',1])
# Exchange inner and outer layers 
print(ser_obj.swaplevel())
  • Running results

7、Pandas Statistical calculation

 Statistical calculation : Calculate by column by default
  • Example
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(32).reshape(8,4))
# selection 
df.sum()
  • Running results

  • Commonly used statistical functions
 The average :np.mean()
The sum of the :np.sum()
Median :np.median()
Maximum :np.max()
minimum value :np.min()
The frequency of ( Count ): np.size()
variance :np.var()
Standard deviation :np.std()
The product of :np.prod()
covariance : np.cov(x, y)
Skewness coefficient (Skewness): skew(x)
Kurtosis coefficient (Kurtosis): kurt(x)
Normality test results : normaltest(np.array(x))
Four percentile :np.quantile(q=[0.25, 0.5, 0.75], interpolation=“linear”)
Four percentile :describe() – Show 25%, 50%, 75% Data on location
correlation matrix (Spearman/ Person/ Kendall) The correlation coefficient : x.corr(method=“person”))

8、 Data reading and storage

8.1、 Read and write text format file

  • Read csv file read_csv(file_path or buf,usecols,encoding):file_path: File path ,usecols: Specify the column name to read ,encoding: code

  • Example

data = pd.read_csv('D:/jupyter_notebook/bfms_w2_out.csv',encoding='utf8')
data.head()
  • Running results

9、 Data connection / Merge

9.1、 Data connection

pd.merge:(left, right, how='inner',on=None,left\_on=None, right\_on=None \)
left: On the left side of the merger DataFrame
right: When merging, the one on the right DataFrame
how: The way to merge , Default 'inner', 'outer', 'left', 'right'
on: Column names that need to be merged , There must be a list on both sides , And left and right The intersection of column names in is used as the join key
left\_on: left Dataframe Column used as a join key in
right\_on: right Dataframe Column used as a join key in
* Internal connection inner: Join the intersection of the keys in both tables
  • Example
import pandas as pd
import numpy as np
left = pd.DataFrame({
'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({
'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left,right,on='key') # Specify the connection key key
  • Running results

9.2、 Data merging

concat: You can specify the axis to merge horizontally or vertically
  • Example
df1 = pd.DataFrame(np.arange(6).reshape(3,2),index=list('abc'),columns=['one','two'])
df2 = pd.DataFrame(np.arange(4).reshape(2,2)+5,index=list('ac'),columns=['three','four'])
print(df1)
print(df2)
pd.concat([df1,df2],axis='columns') # Appoint axis=1 Connect 
  • Running results

9.3、 Data remodeling

stack:stack The function takes data from ” Table structure “ become ” Curly bracket structure “, Change its row index into column index
unstack:unstack Function to transfer data from ” Curly bracket structure “ become ” Table structure “, That is to change the column index of one layer into a row index .

  • Example
import numpy as np
import pandas as pd
df_obj = pd.DataFrame(np.random.randint(0,10, (5,2)), columns=['data1', 'data2'])
print(df_obj)
print("stack")
stacked = df_obj.stack()
print(stacked)
print("unstack")
# Default operation inner index 
print(stacked.unstack())
# adopt level Specifies the level of the operation index 
print(stacked.unstack(level=0))
  • Running results


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved