Group data sets and apply a function to each group ( Whether it's aggregation or transformation ), It is usually an important part of data analysis . Loading the dataset 、 The fusion 、 When you're ready , It is usually used to calculate group statistics or generate pivot tables .pandas Provides a flexible and efficient gruopby function , It allows you to slice data sets in a natural way 、 cutting 、 Abstract and so on .
Relational databases and SQL(Structured Query Language, Structured query language ) One of the reasons why it is so popular is that it can easily connect data 、 Filter 、 Transformation and aggregation . however , image SQL The types of grouping operations that such query languages can perform are very limited . In this chapter you will see , because Python and pandas Strong expression skills , We can perform much more complex grouping operations ( Use whatever is acceptable pandas Object or NumPy Array functions ). In this chapter , You will learn :
note : Aggregation of time series data (groupby One of the special uses of ) Also called resampling (resampling), This book will be in 11 It is explained separately in chapter .
Hadley Wickham( Many popular R Author of the language pack ) Created a term for grouping operations "split-apply-combine"( Split - application -