程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data analysis 03 pandas

編輯:Python

Catalog

1. Pandas Basic concepts of

1.1 Series Method : 

1.2 DataFrame It's like a two-dimensional array , There are ranks

2. choice : from Series and DataFrame Select some data in the instance

2.1 Series: Index or index location

 2.2 Series attribute :iloc,loc( Press “ That's ok ” To index )

 3. DataFrame How to index

3.1 Press That's ok or Column Index  

3.2 Read multiple rows and columns :loc Method

3.3 Two dimensional selection

4.  Missing values are automatically aligned with data

4.1 Series Method

4.2 DataFrame Method

4.3 fill NaN Method :

6. Data consolidation and grouping

6.1 Merge two DataFrame The two methods :

6.1.1  Simple splicing ----concat

6.1.2  Merge one by one according to the column name query ---merge

 6.2 Pandas It also supports similar database query statements GROUP BY, You can complete grouping according to a certain column

7. Time series processing

  7.1 The operation of time difference

 7.2 pandas And datetime

7.3 pandas Date range can be generated by means of .date_range function


1. Pandas Basic concepts of

Pandas:
Data analysis , stay Numpy Advanced functions are added on the basis of : Automatic data alignment , Time series support 、 Flexible handling of missing data, etc
Series、DataFrame Core data structure , Most of the Pandas The functions revolve around these two data structures
Series Is a worthwhile sequence , It can be understood as a one-dimensional array , There is a column and an index , Index can be customized 

1.1 Series Method : 

import pandas as pd
s1 = pd.Series([1,2,3,4,5])
print(s1)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
0 1
1 2
2 3
3 4
4 5
dtype: int64
Process finished with exit code 0
"""
import pandas as pd
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a 1
b 2
c 3
d 4
e 5
dtype: int64
"""

1.2 DataFrame It's like a two-dimensional array , There are ranks

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a 0.341299 -1.501784 1.069910 0.879989
b 0.416756 1.066293 0.569988 2.745966
c 0.711972 -0.336308 -0.006444 1.322002
d 2.217314 -0.281477 -0.706486 0.117150
Process finished with exit code 0
"""
 By specifying the index -index And labels -columns establish DataFrame object , Can pass df.index and df.columns Access indexes and tags :
 df.index
Out[12]: Index(['a', 'b', 'c', 'd'], dtype='object')
df.columns
Out[13]: Index(['A', 'B', 'C', 'D'], dtype='object')

2. choice : from Series and DataFrame Select some data in the instance

2.1 Series: Index or index location

import pandas as pd
import numpy as np
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2[0])
print('_______')
print(s2[0:3])
print(s2['a'])
print("________")
print(s2['a':'c'])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
1
_______
a 1
b 2
c 3
dtype: int64
1
________
a 1
b 2
c 3
dtype: int64
Process finished with exit code 0
"""

 2.2 Series attribute :iloc,loc( Press “ That's ok ” To index )

import pandas as pd
import numpy as np
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2.iloc[0:3]) # Access by default index
print("--------------")
print(s2.loc['a':'c']) # According to the custom index visit
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a 1
b 2
c 3
dtype: int64
--------------
a 1
b 2
c 3
dtype: int64
Process finished with exit code 0
"""

 3. DataFrame How to index

 Tag values - Column
df.A
df['A']
Index position - That's ok
df.loc['a'] # This method is customized index Value to index
df.iloc[0] # This method uses the default index To index
Index position multiple rows - Multiple columns :
df.loc[:,['B','C','D']]
Two dimensional selection :
spot :df.loc['a','A']
block :df.loc['a':'b','A':'C']

3.1 Press That's ok or Column Index  

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
# Press “ Column ” To retrieve data
print(df.A) # Tag values - Column
print("-----")
print(df['A']) # Tag values - Column
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a -0.931263
b -0.648751
c 0.438436
d -1.481929
Name: A, dtype: float64
-----
a -0.931263
b -0.648751
c 0.438436
d -1.481929
Name: A, dtype: float64
"""

3.2 Read multiple rows and columns :loc Method

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
print("-----")
print(df.loc[:,['B','C','D']]) # Tag values - Multiple rows and columns ( By default )
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a -1.205197 -0.375471 0.115681 0.111243
b -0.329662 0.001292 -0.540496 -1.274938
c -0.285998 0.122846 -0.738836 0.213211
d -1.479184 0.251340 0.322654 -0.745249
-----
B C D
a -0.375471 0.115681 0.111243
b 0.001292 -0.540496 -1.274938
c 0.122846 -0.738836 0.213211
d 0.251340 0.322654 -0.745249
"""

3.3 Two dimensional selection

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
print("-----")
print(df.loc['a','A']) # spot
print("----")
print(df.loc['a':'b','A':'C'])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a -0.234136 -0.458588 0.672268 -0.749685
b 0.462632 0.681731 1.438152 -0.073641
c -0.649510 0.443019 0.361910 0.589839
d -2.194516 -1.881632 -0.470177 2.606073
-----
-0.23413573419505523
----
A B C
a -0.234136 -0.458588 0.672268
b 0.462632 0.681731 1.438152
"""

4.  Missing values are automatically aligned with data

 This function can perform arithmetic operations on different index objects , Missing values during the operation will be propagated in the form of NaN Value is automatically filled in .

4.1 Series Method

import pandas as pd
import numpy as np
s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])
s2 = pd.Series([2,3,4,5], index=['b','c','d','e'])
print(s1+s2)
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a NaN
b 4.0
c 6.0
d 8.0
e NaN
dtype: float64
'''

4.2 DataFrame Method

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns=list('ABC'),index=list('abc'))
df2 = pd.DataFrame(np.arange(12).reshape(3,4),columns=list('ABCE'),index=list('bcd'))
print(df1+df2)
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C E
a NaN NaN NaN NaN
b 3.0 5.0 7.0 NaN
c 10.0 12.0 14.0 NaN
d NaN NaN NaN NaN
'''

4.3 fill NaN Method :

df1.add(df2, fill_value=0)
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns=list('ABC'),index=list('abc'))
df2 = pd.DataFrame(np.arange(12).reshape(3,4),columns=list('ABCE'),index=list('bcd'))
print(df1+df2)
print('------')
print(df1.add(df2, fill_value=0))
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C E
a NaN NaN NaN NaN
b 3.0 5.0 7.0 NaN
c 10.0 12.0 14.0 NaN
d NaN NaN NaN NaN
------
A B C E
a 0.0 1.0 2.0 NaN
b 3.0 5.0 7.0 3.0
c 10.0 12.0 14.0 7.0
d 8.0 9.0 10.0 11.0
'''

 5. Operation statistics

 Statistics :
similar Numpy,Series And DataFrame Various statistical methods can also be used : Average 、 variance 、 Sum up, etc , It can be done by descirbe Method to get common statistics
A B C
count 3.0 3.0 3.0 Number of element values
mean 3.0 4.0 5.0 The average
std 3.0 3.0 3.0 Standard deviation
min 0.0 1.0 2.0 minimum value
25% 1.5 2.5 3.5 Value percentage
50% 3.0 4.0 5.0 Value percentage
75% 4.5 5.5 6.5 Value percentage
max 6.0 7.0 8.0 Maximum 

6. Data consolidation and grouping

6.1 Merge two DataFrame The two methods :

6.1.1  Simple splicing ----concat

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,3))
df2 = pd.DataFrame(np.random.randn(3,3),index=[5,6,7])
print(pd.concat([df1,df2]))
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
0 1 2
0 1.236067 0.751290 0.358762
1 -1.605407 -1.296070 -0.167892
2 1.403888 1.962560 0.766084
5 -1.118603 0.845264 -0.890752
6 -1.209584 0.006337 0.310854
7 2.104464 -0.157647 -1.805883
Process finished with exit code 0
"""

6.1.2  Merge one by one according to the column name query ---merge

df1 = pd.DataFrame({'user_id':[5248,13],'course':[12,45],'minutes':[9,36]})
df2 = pd.DataFrame({'course':[12,5], 'name':['Numpy','Pandas']})
print(pd.merge([df1,df2]))

 6.2 Pandas It also supports similar database query statements GROUP BY, You can complete grouping according to a certain column

import pandas as pd
df1 = pd.DataFrame({'user_id':[5248,13,5348],'course':[12,45,23],'minutes':[9,36,45]})
a = df1[['user_id','minutes']].groupby('user_id').sum() # adopt 'user_id' and 'minutes' To group , And press 'user_id' array
print(a)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
minutes
user_id
13 36
5248 9
5348 45
Process finished with exit code 0
"""

7. Time series processing

datetime Property object :
.datetime Represents the time object
.date Represents a certain day
.timedelta Represents time difference 

  7.1 The operation of time difference

from datetime import datetime, timedelta
d1 = datetime(2020,3,15)
delta = timedelta(days=10) # Time is 10 God
print(d1+delta)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
2020-03-25 00:00:00
"""

 7.2 pandas And datetime

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
dates = [datetime(2020,3,15),datetime(2020,3,16),datetime(2020,3,17),datetime(2020,3,18)]
ts = pd.Series(np.random.randn(4),index=dates) # Array ts The index of index Defined as dates Value
print(ts)
print('------')
print(dates)
print('------')
print(ts.index[0])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
2020-03-15 -0.185834
2020-03-16 -2.075404
2020-03-17 -1.093103
2020-03-18 0.171173
dtype: float64
------
[datetime.datetime(2020, 3, 15, 0, 0), datetime.datetime(2020, 3, 16, 0, 0), datetime.datetime(2020, 3, 17, 0, 0), datetime.datetime(2020, 3, 18, 0, 0)]
------
2020-03-15 00:00:00
"""
pandas Get the value corresponding to the index :
ts[ts.index[0]] # ts.index[0] Indicates the index value
ts['2020/3/15']
ts['3/15/2020']
ts[datetime(2020,3,15)]

7.3 pandas Date range can be generated by means of .date_range function

pandas Date range can be generated by means of .date_range function
This function can pass parameters :
start: Specify the date range start time
end: Specify the date range to
preiods: Specify the date range interval
freq: Specify the date frequency :D- Every day ,H- Every hour ,M- monthly
5D - 5 God
MS- The first day of every month
BM- The last working day of each month
1h30min 1 Hours 30 minute
pd.date_range('2020-1-1','2021',freq='MS')


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved