您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[pandas] data connection pd concat

編輯：Python

Pandas Data can be realized The longitudinal and The transverse Connect , After connecting the data, a new object will be formed (Series or DataFrame)

Join is the most common multiple data merge operation

pd.concat() Is a special function for data connection merging , It can operate along rows or columns , At the same time, you can specify the merging method of non merging axes ( Such as collection 、 Intersection, etc )

pd.concat() A merged DataFrame

grammar

pd.concat(objs, axis=0, join='outer', ignore_index=False,
keys=None, levels=None, names=None, sort=False,
verify_integrity=False, copy=True)

Parameters

objs: Data to be connected , It can be more than one DataFrame perhaps Series, It is Must pass parameters

axis: The way to connect the shafts , The default value is 0, That is, connect by line , Append after line ; The value is 1 Is appended to the column ( Connect by column :axis=1)

join: Merger method , The data on the other axes is by intersection (inner) Or union (outer) A merger

ignore_index: Whether to keep the original index

keys: Connection relationship , Use the passed key as the outermost level to construct a hierarchical index , It is to assign a primary index to each table

names: Name of index , Including multi tier index

verify_integrity: Whether to detect content duplication ; Parameter is True when , If the merged data contains rows with the same index as the original data , May be an error

copy: If False, Don't make a deep copy

1. Connect by line

pd.concat() The basic operations of can be realized df.append() function

In operation ignore_index and sort The function of parameters is the same ,axis The default value is 0, That is, connect by line

import pandas as pd
df1 = pd.DataFrame({'x':[1,2],'y':[3,4]})
df2 = pd.DataFrame({'x':[5,6],'y':[7,8]})
res1 = pd.concat([df1,df2])
# The effect same as above
res2 = df1.append(df2)

df1

df2

res1

res2

2. Connect by column

If you want to put multiple DataFrame Spliced together by columns , You can pass in axis=1 Parameters , This appends different data to the end of the column , The index cannot correspond to the location Fill the value with NaN

import pandas as pd
df1 = pd.DataFrame({'x':[1,2],'y':[3,4]})
df2 = pd.DataFrame({'x':[5,6,0],'y':[7,8,0]})
res = pd.concat([df1,df2], axis=1)

df1

df2

res

In this example ,df2 Than df1 One more line , After the merger df1 Part of the NaN

3. Merge intersection

The join operation of the above two exercise cases will result in the union of the contents of the two tables ( The default is join='outer')

Merging intersections requires that join Change the parameters join='inner'

import pandas as pd
df1 = pd.DataFrame({'x':[1,2],'y':[3,4]})
df2 = pd.DataFrame({'x':[5,6,0],'y':[7,8,0]})
# Merge intersections by column
# Pass in join=’inner’ Get two DataFrame The common parts of , In addition to the df1 There is no third line
res = pd.concat([df1,df2], axis=1, join='inner')

df1

df2

res

Expand

adopt reindex() Method can also realize the function of taking intersection

# The two methods
res1 = pd.concat([df1,df2],axis=1).reindex(df1.index)
res2 = pd.concat([df1,df2.reindex(df1.index)],axis=1)

res1

res2

4. Merge with sequence

import pandas as pd
z = pd.Series([9,9],name='z')
df = pd.DataFrame({'x':[1,2],'y':[3,4]})
# Add sequence to new column
res = pd.concat([df,z],axis=1)

res

5. Specify the index

import pandas as pd
df1 = pd.DataFrame({'x':[1,2],'y':[3,4]})
df2 = pd.DataFrame({'x':[5,6],'y':[7,8]})
# Specify the index name
res1 = pd.concat([df1,df2], keys=['a','b'])
# In the form of a dictionary
dict = {'a':df1, 'b':df2}
res2 = pd.concat(dict)
# Horizontal merger , Specify the index
res3 = pd.concat([df1,df2], axis=1, keys=['a','b'])

df1