您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

How does pandas change from one line to another (splicing and aggregating text)?

編輯：Python

1. Scenario introduction

Sometimes when performing summary calculation , We need to convert the data before processing into the data after processing .
Before processing ：

After processing ：

2. Implementation method

Prepare the data in advance ：

import pandas as pd
# Prepare the data 
df = pd.DataFrame({
' full name ': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
' department ':[' The sales department ', ' The sales department ', ' The sales department ', ' The sales department ', ' The personnel department ', ' The personnel department ', ' The personnel department '],
' Management area ':[' south China ', ' The north China ', ' Central China ', ' East China ', ' south China ', ' The north China ', ' Central China ']})
df

2.1 Implementation method basic edition

result=df.groupby(df[' full name ']).agg( Department =(' department ',lambda x:','.join(x.unique())),
Managed area =(' Management area ',lambda x:','.join(x.unique()))).reset_index()

2.2 Further exploration

according to 2.1 In this way, we can achieve our expectations , But when there are many aggregated fields , We will face the trouble of writing multiple anonymous functions , Not very convenient , So how to solve this problem ？

def string_concat(column_name,sep=','):
return sep.join(column_name.unique())
result=df.groupby(df[' full name ']).agg( Department =(' department ',string_concat),
Managed area =(' Management area ',string_concat)).reset_index()
result

2.3 Upgrade again

be based on 2.2 We solved the problem of writing anonymous functions repeatedly , But in reality , If we use different delimiters for multiple columns , It's not very humanized , Then how to solve this problem ？

def custome_str_cat(sep='|'):
def str_cat(column_name):
return sep.join(column_name.unique())
if sep:
return str_cat
result=df.groupby(df[' full name ']).agg( Department =(' department ', custome_str_cat('*')), Managed area =(' Management area ',
custome_str_cat()), Managed area 2=(' Management area ', custome_str_cat('—'))).reset_index()

The effect diagram after implementation is as follows ：

3. At the end

The above is about the aggregate function of text class , amount to postgresql In the database string_agg function ,Oracle Medium wm_concat function ,MySQL Medium GROUP_CONCAT function . So how to be in pandas Turn one row into multiple rows ？ That is, the processed data in this article will be transformed into the data before processing , You can refer to this blog , Portal ：https://blog.csdn.net/qq_41780234/article/details/121623812?spm=1001.2014.3001.5502