您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[Python] 11 moves comparison pandas double column summation

編輯：Python

official account ： Youer cottage
author ：Peter
edit ：Peter

Hello everyone , I am a Peter~

This paper introduces 11 Two ways to compare Pandas in DataFrame Sum of two columns

direct_add
for_iloc
iloc_sum
iat
apply（ Specified field ）
apply（ For the whole DataFrame）
numpy_array
iterrows
zip
assign
sum

Send books at the end of the article , Send books at the end of the article , Send books at the end of the article ！

The data simulation

In order to have a clear effect , Simulated a 5 Million pieces of data ,4 A field ：

import pandas as pd
import numpy as np
data = pd.DataFrame({
    "A":np.random.uniform(1,1000,50000), 
    "B":np.random.uniform(1,1000,50000),
    "C":np.random.uniform(1,1000,50000),
    "D":np.random.uniform(1,1000,50000)
})
data

11 Functions

Here is the passage 11 Three different functions to implement A、C The data of two columns are added and summed E Column

Method 1： Direct additive

hold df The two columns of are added directly

In [3]:

def fun1(df):
    df["E"] = df["A"] + df["C"]

Method 2：for+iloc location

for sentence + iloc Method

In [4]:

def fun2(df):
    for i in range(len(df)):  
        df["E"] = df.iloc[i,0] + df.iloc[i, 2]  # iloc[i,0] location A Columns of data

Method 3：iloc + sum

iloc Method specifies the sum of columns for all rows ：

0： First column A
2： The third column C

In [5]:

def fun3(df):
    df["E"] = df.iloc[:,[0,2]].sum(axis=1)  # axis=1 Means to operate on a column

Method 3：iat location

for sentence + iat location , Analogy to for + iloc

In [6]:

def fun4(df):
    for i in range(len(df)):
        df["E"] = df.iat[i,0] + df.iat[i, 2]

apply function ( Read only two columns )

apply Method , Just take out AC Two

In [7]:

def fun5(df):
    df["E"] = df[["A","C"]].apply(lambda x: x["A"] + x["C"], axis=1)

apply function （ All df）

For the front DataFrame Use apply Method

In [8]:

def fun6(df):
    df["E"] = df.apply(lambda x: x["A"] + x["C"], axis=1)

numpy Array

Use numpy Array resolution

In [9]:

def fun7(df):
    df["E"] = df["A"].values + df["C"].values

iterrows iteration

iterrows() Iterate over each row of data

In [10]:

def fun8(df):
    for _, rows in df.iterrows():
        rows["E"] = rows["A"] + rows["C"]

zip function

adopt zip The function will now AC Two columns of data are compressed

In [11]:

def fun9(df):
    df["E"] = [i+j for i,j in zip(df["A"], df["C"])]

assign function

Through derived functions assign Generate new fields E

In [12]:

def fun10(df):
    df.assign(E = df["A"] + df["C"])

sum function

At the designated A、C Use... On both columns sum function

In [13]:

def fun11(df):
    df["E"] = df[["A","C"]].sum(axis=1)

result

call 11 Functions , Compare their speed ：

Count the mean value of each method , And put them into the same us：

Method result Unified （us） Direct additive 626us626for + iloc9.61s9610000iloc + sum1.42ms1420iat9.2s9200000apply（ Only the specified column ）666ms666000apply（ All columns ）697ms697000numpy216us216iterrows3.29s3290000zip17.9ms17900assign888us888sum(axis=1)1.33ms1330

result = pd.DataFrame({"methods":["direct_add","for_iloc","iloc_sum","iat","apply_part","apply_all",
                                  "numpy_arry","iterrows","zip","assign","sum"],
                      "time":[626,9610000,1420,9200000,666000,697000,216,3290000,17900,888,1330]})
result

Visualize in descending order ：

result.sort_values("time",ascending=False,inplace=True)
import plotly_express as px
fig = px.bar(result, x="methods", y="time", color="time")
fig.show()

From the results we can see ：

for Loops are the most time consuming , Use numpy Arrays save the most time , Difference between 4 More than ten thousand times ; Mainly because Numpy Vectorization operation used by array
sum function （ Specify axis axis=1） The effect is obviously improved

summary ： If we save energy, we will save , Use... As much as possible Pandas perhaps numpy Built in functions to solve .

 Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》（ Huang haiguang keynote speaker ） Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group