程序師世界是廣大編程愛好者互助、分享、學習的平台，程序師世界有你更精彩！


設為首頁	加入收藏

首頁
編程語言: C語言|JAVA編程
 Python編程
網頁編程: ASP編程|PHP編程
 JSP編程
數據庫知識: MYSQL數據庫|SqlServer數據庫
 Oracle數據庫|DB2數據庫

您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python data analysis 03 pandas

編輯：Python

Catalog

1. Pandas Basic concepts of

1.1 Series Method ：

1.2 DataFrame It's like a two-dimensional array , There are ranks

2. choice ： from Series and DataFrame Select some data in the instance

2.1 Series: Index or index location

2.2 Series attribute :iloc,loc（ Press “ That's ok ” To index ）

3. DataFrame How to index

3.1 Press That's ok or Column Index

3.2 Read multiple rows and columns ：loc Method

3.3 Two dimensional selection

4. Missing values are automatically aligned with data

4.1 Series Method

4.2 DataFrame Method

4.3 fill NaN Method ：

6. Data consolidation and grouping

6.1 Merge two DataFrame The two methods ：

6.1.1 Simple splicing ----concat

6.1.2 Merge one by one according to the column name query ---merge

6.2 Pandas It also supports similar database query statements GROUP BY, You can complete grouping according to a certain column

7. Time series processing

7.1 The operation of time difference

7.2 pandas And datetime

7.3 pandas Date range can be generated by means of .date_range function

1. Pandas Basic concepts of

Pandas：
Data analysis , stay Numpy Advanced functions are added on the basis of ： Automatic data alignment , Time series support 、 Flexible handling of missing data, etc
Series、DataFrame Core data structure , Most of the Pandas The functions revolve around these two data structures
Series Is a worthwhile sequence , It can be understood as a one-dimensional array , There is a column and an index , Index can be customized

1.1 Series Method ：

import pandas as pd
s1 = pd.Series([1,2,3,4,5])
print(s1)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
0 1
1 2
2 3
3 4
4 5
dtype: int64
Process finished with exit code 0
"""

import pandas as pd
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a 1
b 2
c 3
d 4
e 5
dtype: int64
"""

1.2 DataFrame It's like a two-dimensional array , There are ranks

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a 0.341299 -1.501784 1.069910 0.879989
b 0.416756 1.066293 0.569988 2.745966
c 0.711972 -0.336308 -0.006444 1.322002
d 2.217314 -0.281477 -0.706486 0.117150
Process finished with exit code 0
"""

 By specifying the index -index And labels -columns establish DataFrame object , Can pass df.index and df.columns Access indexes and tags :

 df.index
Out[12]: Index(['a', 'b', 'c', 'd'], dtype='object')
df.columns
Out[13]: Index(['A', 'B', 'C', 'D'], dtype='object')

2. choice ： from Series and DataFrame Select some data in the instance

2.1 Series: Index or index location

import pandas as pd
import numpy as np
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2[0])
print('_______')
print(s2[0:3])
print(s2['a'])
print("________")
print(s2['a':'c'])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
1
_______
a 1
b 2
c 3
dtype: int64
1
________
a 1
b 2
c 3
dtype: int64
Process finished with exit code 0
"""

2.2 Series attribute :iloc,loc（ Press “ That's ok ” To index ）

import pandas as pd
import numpy as np
s2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s2.iloc[0:3]) # Access by default index
print("--------------")
print(s2.loc['a':'c']) # According to the custom index visit
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a 1
b 2
c 3
dtype: int64
--------------
a 1
b 2
c 3
dtype: int64
Process finished with exit code 0
"""

3. DataFrame How to index

 Tag values - Column
df.A
df['A']
Index position - That's ok
df.loc['a'] # This method is customized index Value to index
df.iloc[0] # This method uses the default index To index
Index position multiple rows - Multiple columns :
df.loc[:,['B','C','D']]
Two dimensional selection ：
spot ：df.loc['a','A']
block ：df.loc['a':'b','A':'C']

3.1 Press That's ok or Column Index

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
# Press “ Column ” To retrieve data
print(df.A) # Tag values - Column
print("-----")
print(df['A']) # Tag values - Column
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a -0.931263
b -0.648751
c 0.438436
d -1.481929
Name: A, dtype: float64
-----
a -0.931263
b -0.648751
c 0.438436
d -1.481929
Name: A, dtype: float64
"""

3.2 Read multiple rows and columns ：loc Method

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
print("-----")
print(df.loc[:,['B','C','D']]) # Tag values - Multiple rows and columns ( By default )
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a -1.205197 -0.375471 0.115681 0.111243
b -0.329662 0.001292 -0.540496 -1.274938
c -0.285998 0.122846 -0.738836 0.213211
d -1.479184 0.251340 0.322654 -0.745249
-----
B C D
a -0.375471 0.115681 0.111243
b 0.001292 -0.540496 -1.274938
c 0.122846 -0.738836 0.213211
d 0.251340 0.322654 -0.745249
"""

3.3 Two dimensional selection

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,4),index=['a','b','c','d'],columns=['A','B','C','D'])
print(df)
print("-----")
print(df.loc['a','A']) # spot
print("----")
print(df.loc['a':'b','A':'C'])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C D
a -0.234136 -0.458588 0.672268 -0.749685
b 0.462632 0.681731 1.438152 -0.073641
c -0.649510 0.443019 0.361910 0.589839
d -2.194516 -1.881632 -0.470177 2.606073
-----
-0.23413573419505523
----
A B C
a -0.234136 -0.458588 0.672268
b 0.462632 0.681731 1.438152
"""

4. Missing values are automatically aligned with data

 This function can perform arithmetic operations on different index objects , Missing values during the operation will be propagated in the form of NaN Value is automatically filled in .

4.1 Series Method

import pandas as pd
import numpy as np
s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])
s2 = pd.Series([2,3,4,5], index=['b','c','d','e'])
print(s1+s2)
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
a NaN
b 4.0
c 6.0
d 8.0
e NaN
dtype: float64
'''

4.2 DataFrame Method

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns=list('ABC'),index=list('abc'))
df2 = pd.DataFrame(np.arange(12).reshape(3,4),columns=list('ABCE'),index=list('bcd'))
print(df1+df2)
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C E
a NaN NaN NaN NaN
b 3.0 5.0 7.0 NaN
c 10.0 12.0 14.0 NaN
d NaN NaN NaN NaN
'''

4.3 fill NaN Method ：

df1.add(df2, fill_value=0)

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns=list('ABC'),index=list('abc'))
df2 = pd.DataFrame(np.arange(12).reshape(3,4),columns=list('ABCE'),index=list('bcd'))
print(df1+df2)
print('------')
print(df1.add(df2, fill_value=0))
'''
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
A B C E
a NaN NaN NaN NaN
b 3.0 5.0 7.0 NaN
c 10.0 12.0 14.0 NaN
d NaN NaN NaN NaN
------
A B C E
a 0.0 1.0 2.0 NaN
b 3.0 5.0 7.0 3.0
c 10.0 12.0 14.0 7.0
d 8.0 9.0 10.0 11.0
'''

5. Operation statistics

 Statistics ：
similar Numpy,Series And DataFrame Various statistical methods can also be used ： Average 、 variance 、 Sum up, etc , It can be done by descirbe Method to get common statistics
A B C
count 3.0 3.0 3.0 Number of element values
mean 3.0 4.0 5.0 The average
std 3.0 3.0 3.0 Standard deviation
min 0.0 1.0 2.0 minimum value
25% 1.5 2.5 3.5 Value percentage
50% 3.0 4.0 5.0 Value percentage
75% 4.5 5.5 6.5 Value percentage
max 6.0 7.0 8.0 Maximum

6. Data consolidation and grouping

6.1 Merge two DataFrame The two methods ：

6.1.1 Simple splicing ----concat

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,3))
df2 = pd.DataFrame(np.random.randn(3,3),index=[5,6,7])
print(pd.concat([df1,df2]))
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
0 1 2
0 1.236067 0.751290 0.358762
1 -1.605407 -1.296070 -0.167892
2 1.403888 1.962560 0.766084
5 -1.118603 0.845264 -0.890752
6 -1.209584 0.006337 0.310854
7 2.104464 -0.157647 -1.805883
Process finished with exit code 0
"""

6.1.2 Merge one by one according to the column name query ---merge

df1 = pd.DataFrame({'user_id':[5248,13],'course':[12,45],'minutes':[9,36]})
df2 = pd.DataFrame({'course':[12,5], 'name':['Numpy','Pandas']})
print(pd.merge([df1,df2]))

6.2 Pandas It also supports similar database query statements GROUP BY, You can complete grouping according to a certain column

import pandas as pd
df1 = pd.DataFrame({'user_id':[5248,13,5348],'course':[12,45,23],'minutes':[9,36,45]})
a = df1[['user_id','minutes']].groupby('user_id').sum() # adopt 'user_id' and 'minutes' To group , And press 'user_id' array
print(a)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
minutes
user_id
13 36
5248 9
5348 45
Process finished with exit code 0
"""

7. Time series processing

datetime Property object ：
.datetime Represents the time object
.date Represents a certain day
.timedelta Represents time difference

7.1 The operation of time difference

from datetime import datetime, timedelta
d1 = datetime(2020,3,15)
delta = timedelta(days=10) # Time is 10 God
print(d1+delta)
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
2020-03-25 00:00:00
"""

7.2 pandas And datetime

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
dates = [datetime(2020,3,15),datetime(2020,3,16),datetime(2020,3,17),datetime(2020,3,18)]
ts = pd.Series(np.random.randn(4),index=dates) # Array ts The index of index Defined as dates Value
print(ts)
print('------')
print(dates)
print('------')
print(ts.index[0])
"""
D:\Anaconda3\python.exe D:/Python_file_forAnconda3_python/ Data analysis / Custom learning /Pandas01.py
2020-03-15 -0.185834
2020-03-16 -2.075404
2020-03-17 -1.093103
2020-03-18 0.171173
dtype: float64
------
[datetime.datetime(2020, 3, 15, 0, 0), datetime.datetime(2020, 3, 16, 0, 0), datetime.datetime(2020, 3, 17, 0, 0), datetime.datetime(2020, 3, 18, 0, 0)]
------
2020-03-15 00:00:00
"""

pandas Get the value corresponding to the index ：
ts[ts.index[0]] # ts.index[0] Indicates the index value
ts['2020/3/15']
ts['3/15/2020']
ts[datetime(2020,3,15)]

7.3 pandas Date range can be generated by means of .date_range function

pandas Date range can be generated by means of .date_range function
This function can pass parameters ：
start: Specify the date range start time
end： Specify the date range to
preiods： Specify the date range interval
freq： Specify the date frequency ：D- Every day ,H- Every hour ,M- monthly
5D - 5 God
MS- The first day of every month
BM- The last working day of each month
1h30min 1 Hours 30 minute
pd.date_range('2020-1-1','2021',freq='MS')

上一篇文章： Python data analysis 01 - use of Jupiter
下一篇文章： Python data analysis 04 - Matplotlib

Python

Python-面向對象-任務一學習筆記

多繼承

Picture to character drawing in Python, so easy!

author | Parson sauce source

Get steam preferential game data in real time with Python

Preface Steam By the American

PHP跌出前十，Python依然霸占榜首，C＃有望摘得年度編程語言 TIOBE 12 月編程語言排行榜

整理：丁廣輝出品：CSDN日前，全球知名 TIOBE 編程語

04 Python MyQR 兩行代碼生成專屬二維碼自定義內容

七夕就要來了，不論你是不是一個人，都祝你七夕快樂！！！！首先

Learning notes of python3 web crawler development practice_ P1

相關文章

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

51job crawler + data visualization Python

Python data structure problems

Introduction to Python data structure and algorithm

Python calculates the area and perimeter of a circle. Analysis of the real problem of level 2 of the python programming level examination of the Electronic Society for youth programming march2021

Django project - order module (next) and data statistics_ 11 [more readable version]

Python data analysis - pandas data structure (dataframe)

Python data analysis science library pandas (statistical analysis and decision)

Python -- data visualization using Matplotlib Library

I read a value from a file. How can I make this data value locate according to the value I read (Language Python)

閱讀排行榜

Python upgrade path (Lv9) file operation Complete steps for Python installation and virtual environment establishment 1 line of Python code, merging 100 Excel files, so convenient?! 【Python】用PyQt5制作簡單的圖書借閱系統 Python實現GUI學生管理系統的示例代碼還是不夠快？幾個方法幫你加快Python運行速度 Is the attribute in C the same as the decorator in python/typescript Python program keeps reporting errors modulenotfounderror: no module namedpygame Pool de connexion à la base de données dencapsulation Python Excel operation in Python Java for mobile phone development has a sketchware graphical interface. What does Python have?

熱門圖文

4.5 .net core下直接執行SQL語句並生成DataTable，coredatatable PHP截取字符串的例子 php-PHP菜鳥求解，insert命令總是無法插入mysql數據庫 WPF 4 DataGrid 控件（進階篇二）報表應用系列——圖表JFreeChart: 第2章柱狀圖 Java規則引擎工作原理及其應用 C語言之選擇排序 Laravel 5框架學習之向視圖傳送數據，laravel框架

欄目導航

編程綜合問答

更多關於編程

編程問題解答

Copyright © 程式師世界 All Rights Reserved