pandas yes "Python data analysis" For short , Used literally in data analysis , In order to Numpy Based extension ,Pandas Widely used in academic 、 Finance 、 Statistics and other data analysis fields .
The main reason is that the elements in the matrix can be heterogeneous ( It can be different )
Reading and saving of files (excel、csv)
Make table
import pandas as pd
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
File read :
url form
data = pd.read_csv(
'https://labfile.oss.aliyuncs.com/courses/1283/adult.data.csv')
print(data.head())
Local form
data = pd.read_csv("./test1.txt", sep=' ') # Read the file to with a space as a separator pandas In the table , Other default
data = pd.read_csv("./test1.txt", sep=' ', header=None, index_col=False, dtype=np.float64)
Row index and column index
----------------------------------
data = pd.DataFrame(np.arange(12, 24).reshape((3, 4)), index=["a", "b", "c"], columns=["A", "B", "C", "D"])
out:
A B C D
a 12 13 14 15
b 16 17 18 19
c 20 21 22 23
----------------------------------
# One 、 Press data[] Come on :
data[...] # That's ok
----------------------
...
----------------------
data[{
Row position 1}:{
Row position 2}] # Multiple lines
data[{
' Row labels 1'}:{
' Row labels 2'}] # Multiple lines
----------------------
in: data[0:3]
out:
A B C D
a 12 13 14 15
b 16 17 18 19
c 20 21 22 23
----------------------
----------------------
in: data['a':'c']
out:
A B C D
a 12 13 14 15
b 16 17 18 19
c 20 21 22 23
----------------------
data['{ Column name }'] # Column
----------------------
in: data['A']
out:
a 12
b 16
c 20
----------------------
data[['{ Column name 1}',' Column name 2']] # Multiple columns
----------------------
in: data[['A','B']]
out:
A B
a 12 13
b 16 17
c 20 21
----------------------
# Two 、 Press index( label ) Come on :
data.loc['{ Line name }'] # That's ok
----------------------
in: data.loc['a']
out:
A 12
B 13
C 14
D 15
----------------------
data.loc[['{ Line name 1}',' Line name 2']] # Multiple lines
----------------------
in: data.loc[['a','b']]
out:
A B C D
a 12 13 14 15
b 16 17 18 19
----------------------
# 3、 ... and 、 Press Location ( Indexes ) Come on :
data.iloc['{ Row position }'] # That's ok
----------------------
in: data.iloc[1]
out:
A 16
B 17
C 18
D 19
----------------------
data.iloc[['{ Row position 1}',' Row position 2']] # Multiple lines
----------------------
in: data.iloc[[1,2]]
out:
A B C D
b 16 17 18 19
c 20 21 22 23
----------------------
Row and column Slicing and selection
data = pd.DataFrame(np.arange(12, 24).reshape((3, 4)), index=["a", "b", "c"], columns=["A", "B", "C", "D"])
# Are all the same
print(data.loc['a':'b','A':'B']) # Slice by label
print(data.loc[['a','b'],['A','B']]) # Select by tag
print(data.iloc[0:2,0:2]) # Slice by position
print(data.iloc[[0,1],[0,1]]) # Select by location
# Are all the same
out:
# Are all the same
A B
a 12 13
b 16 17
data = pd.DataFrame(values = {
Incoming data }, index = [{
List of row names }], columns=[ Column name list ])
print(data.index) # Get row properties ( name )
print(data.columns) # Get column properties ( name )
print(data.values) # Get value attribute ( data \ Value matrix )
print(data.describe()) # Get common statistics , Median 、 Average, etc
data = pd.DataFrame(values = {
Incoming data }, index = [{
List of row names }], columns=[ Column name list ])
data.loc['total'] = data.apply(lambda x:x.sum()) # newly added 'total' That's ok , Sum every column
data.loc['total'] = data.apply(lambda x:x.mean()) # newly added 'total' That's ok , Average value of each column
# Return line information , By a value in a column
data[data['{ Column name }'].isin([{
The value of the column }])]