您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

itertools. Groupby and pandas Similarities and differences of groupby

編輯：Python

background

Recently I met a bug, In old code , It's OK to use it for many years , However, new business requirements have encountered problems . After exclusion , The discovery is due to itertools.groupby The usage of is different from that in imagination , At least I know pandas.groupby Not quite the same. . I haven't seen any relevant comparisons on the Internet , So I wrote a comparison of similarities and differences .

itertools.groupby And pandas.groupby Similarities

# itertools.groupby
from itertools import groupby
a = groupby([1,1,2,3])
for i,j in a:
print(i,' ',len(list(j)))
# pandas.groupby
import pandas
b = pandas.Series([0,1,2,3],index=[1,1,2,3],name='b').groupby(level=0,)
b.apply(lambda x:len(x))

It can be seen that both groups are based on certain rules , But when Data order when , The results of the two groups are similar .

itertools.groupby And pandas.groupby The difference between

# itertools.groupby
from itertools import groupby
a = groupby([1,2,3,1])
for i,j in a:
print(i,' ',len(list(j)))
# pandas.groupby
import pandas
b = pandas.Series([0,1,2,3],index=[1,2,3,1],name='b').groupby(level=0,)
b.apply(lambda x:len(x))

Can see , When Data is out of order when , The results of the two groups are different .
pandas.groupby There is no difference between the results of . but itertools.groupby The grouping result of is more like the grouping result of adjacent data after de duplication , The same value , If divided by other values , The grouping results are completely different .