實際應用中通常需要基於數據框已有列,計算產生新的數據列。本文通過基於計算三列的結果增加列示例進行講解。
df['all_matching'] = df.apply(lambda x: x.col1 == x.col2 == x.col3, axis = 1)
上述代碼創建新all_matching 列,它的值為布爾類型,當三列相等為True,否則為False。
首先定義示例數據:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'A': [4, 0, 3, 3, 6, 8, 7, 9, 12],
'B': [4, 2, 3, 5, 6, 4, 7, 7, 12],
'C': [4, 0, 3, 5, 5, 10, 7, 9, 12]})
#view DataFrame
print(df)
# A B C
# 0 4 4 4
# 1 0 2 0
# 2 3 3 3
# 3 3 5 5
# 4 6 6 5
# 5 8 4 10
# 6 7 7 7
# 7 9 7 9
# 8 12 12 12
下面基於A,B,C三列創建新的列all_matching :
#create new column that displays whether or not all column values match
df['all_matching'] = df.apply(lambda x: x.A == x.B == x.C, axis = 1)
#view updated DataFrame
print(df)
# A B C all_matching
# 0 4 4 4 True
# 1 0 2 0 False
# 2 3 3 3 True
# 3 3 5 5 False
# 4 6 6 5 False
# 5 8 4 10 False
# 6 7 7 7 True
# 7 9 7 9 False
# 8 12 12 12 True
既然apply參數為lambda表達式,當然可以自定義計算邏輯:
df['sum'] = df.apply(lambda x : x.A+x.B+x.C, axis=1)
df['avg'] = df.apply(lambda x : (x.A+x.B+x.C)/3.0, axis=1)
print(df)
# A B C sum avg
# 0 4 4 4 12 4.000000
# 1 0 2 0 2 0.666667
# 2 3 3 3 9 3.000000
# 3 3 5 5 13 4.333333
# 4 6 6 5 17 5.666667
# 5 8 4 10 22 7.333333
# 6 7 7 7 21 7.000000
# 7 9 7 9 25 8.333333
# 8 12 12 12 36 12.000000
下面示例計算列名稱包括Sales的列:
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Nik', 'Kate', 'Kevin', 'Evan', 'Jane', 'Kyra', 'Melissa'],
'January_Sales': [90, 95, 75, 93, 60, 85, 75],
'February_Sales': [95, 95, 75, 65, 50, 85, 100],
'March_Sales': [100, 95, 50, 75, 90, 50, 80],
'Some Random Number': [1,2,3,4,5,6,7]
})
print(df.head())
# 過濾出需要計算的列
sales_columns = [col for col in df.columns if 'Sales' in col]
print(sales_columns)
# Returns: ['January_Sales', 'February_Sales', 'March_Sales']
# 增加計算條件
df['Total Sales'] = df[sales_columns].sum(axis=1)
print(df.head())
返回結果:
Name January_Sales February_Sales March_Sales TotalSales
0 Nik 90 95 100 285
1 Kate 95 95 95 285
2 Kevin 75 75 50 200
3 Evan 93 65 75 233
4 Jane 60 50 90 200
有時我們需要增加匯總行,下面示例如何增加匯總行:
# 定義元組列表
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
('Rabi', 2122, 3022, 3456, 3111, 2109),
('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
('Naresh', 3050, 3050, 2010, 2122, 1111),
('Suman', 2023, 2232, 3050, 2123, 1099),
('Viroj', 2050, 2510, np.NaN, 3012, 2122),
('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# 通過元組列表創建數據框
test = pd.DataFrame(salary_of_employees,
columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May'])
# 設置數據框索引列
test.set_index('Name', inplace=True)
# 計算匯總行作為數據框新的行
total = test.sum()
total.name = 'Total'
# 給數據框追加新行
test = test.append(total.transpose())
print(test)
運行結果:
Jan Feb March April May
Name
Amit 2000.0 2050.0 1099.0 2134.0 2111.0
Rabi 2122.0 3022.0 3456.0 3111.0 2109.0
Abhi NaN 2334.0 2077.0 NaN 3122.0
Naresh 3050.0 3050.0 2010.0 2122.0 1111.0
Suman 2023.0 2232.0 3050.0 2123.0 1099.0
Viroj 2050.0 2510.0 NaN 3012.0 2122.0
Nabin 4000.0 2000.0 2050.0 NaN 2111.0
Total 15245.0 17198.0 13742.0 12502.0 13785.0