pandas It's the library of choice for the rest of the book . It contains data structure and operation tools that make data cleaning and analysis work faster and simpler .pandas Often used with other tools , Such as numerical calculation tools NumPy and SciPy, Analysis Library statsmodels and scikit-learn, And data visualization Library matplotlib.pandas Is based on NumPy Array constructed , In particular, array based functions and do not use for Cyclic data processing .
although pandas A lot of NumPy Encoding style , But the biggest difference between the two is pandas It's designed to handle tables and mixed data . and NumPy It is more suitable to deal with the unified numerical array data .
since 2010 year pandas Since open source ,pandas It gradually grew into a very large library , Applied to many real cases . The developer community already has 800 An independent contributor , They contribute to the project while solving daily data problems .
In subsequent parts of this book , I will use the following pandas Introduce conventions :
In [1]: import pandas as pd
therefore , As long as you see in the code pd., You have to think that this is pandas. because Series and DataFrame Used a lot , Therefore, it is more convenient to introduce it into the local namespace :
In [