Reading guide : Let me introduce it to you , Most commonly used in data analysis 10 individual Python library .
author :Terence Shin
compile :Mika
source :CDA Data Analyst (ID:cdacdacda)
Learning data analysis is by no means easy , There are countless tools and resources available . therefore , Sometimes it's hard for us to figure out what skills to learn , Which tool should I use .
In this paper , Let's introduce it to you —— Most commonly used in data analysis 10 individual Python library . Look at these libraries. Have you ever used them ?
01 Pandas
In the daily work of data analysts ,70% To 80% Both involve understanding and cleaning up data , That is, data exploration and data mining .
Pandas Mainly used for data analysis , This is the most commonly used Python One of the Libraries . It provides you with some of the most useful tools to explore data 、 Clean up and analyze . Use Pandas, You can load 、 Get ready 、 Operate and analyze all kinds of structured data .
02 NumPy
NumPy Mainly used to support N Dimension group . The robustness of these multidimensional arrays is Python List 50 times , It also makes NumPy Become the favorite of many data scientists .
NumPy By TensorFlow Other libraries are used for the internal calculation of tensors .NumPy Provides fast precompiled functions for numerical routines , These functions may be difficult to solve manually . For better efficiency ,NumPy Using array oriented computation , So you can easily handle multiple classes .
03 Scikit-learn
Scikit-learn Can be said to be Python The most important machine learning library in . In the use of Pandas or NumPy After cleaning and processing the data , Can pass Scikit-learn Used to build machine learning model , This is because Scikit-learn Contains a large number of tools for predictive modeling and Analysis .
Use Scikit-learn There are many advantages . such as , You can use Scikit-learn Build several types of machine learning models , Including supervised and unsupervised models , Cross validation of the accuracy of the model , Carry out feature importance analysis .
04 Gradio
Gradio It allows you to build and deploy a machine learning model in just three lines of code web Applications . Its use and Streamlight or Flask identical , But deploying the model is much faster , And it's much easier .
Gradio Its advantages lie in the following points :
Allow further model validation . say concretely , Different inputs in the model can be tested interactively
Easy to demonstrate
Easy to implement and distribute , Anyone can access... Through public links web Applications .
05 TensorFlow
TensorFlow It is the most popular method for implementing neural networks Python One of the Libraries . It uses multidimensional arrays , Also known as tensor , Can perform multiple operations on specific inputs .
Because it is highly parallel in nature , Therefore, multiple neural networks and GPU To get an efficient and scalable model .TensorFlow This feature of is also called pipelining .
06 Keras
Keras It is mainly used to create a deep learning model , Especially neural networks . It is based on TensorFlow and Theano above , It can be used to simply build neural networks . But because of Keras Use the back-end infrastructure to generate calculation diagrams , So compared with other libraries , It's relatively slow .
07 SciPy
SciPy Mainly used for its scientific functions and from NumPy Derived mathematical functions . The library provides statistical functions 、 Optimization function and signal processing function . In order to solve differential equations and provide optimization , It includes the function of numerically calculating the integral .SciPy The advantage is that :
Multidimensional image processing
Ability to solve Fourier transform and differential equations
Because of its optimization algorithm , It can calculate linear algebra very robustly and efficiently
08 Statsmodels
Statsmodels It is a library good at core Statistics . This multi-function library is a mixture of many Python The function of the library , For instance from Matplotlib Get graphic properties and functions in ; Data processing ; Use Pandas, Similar treatment R Formula ; Use Pasty, And based on NumPy and SciPy structure .
say concretely , It is important for creating OLS Such statistical models and performing statistical tests are very useful .
09 Plotly
Plotly It is absolutely a necessary tool for building Visualization , It's very powerful , Easy to use , And can interact with visualization .
And Plotly Also used with Dash, It can be used Plotly Visual tools for building dynamic dashboards .Dash Is based on web Of Python Interface , It solves this kind of analysis web In application JavaScript The needs of , And let you draw online and offline .
10 Seaborn
Seaborn Based on the Matplotlib On , Is a library that can create different visualizations .
Seaborn One of the most important functions is to create a visual effect of enlarged data . So as to highlight the relevant performance that is not obvious at first , Enable data workers to understand the model more correctly .
Seaborn There are also customizable themes and interfaces , It also provides data visualization with a sense of design , Better data reporting in .
Reference link :
https://www.kdnuggets.com/2021/03/top-10-python-libraries-2021.html
Extended reading
Extended reading 《 utilize Python Data analysis 》
Dry goods go straight to
The development of Cloud Computing 4 Stages , At last someone made it clear
The three door problem of counter intuition ,80% All people are wrong ?
These new books of blockbuster , It's time to hoard
From consumer Internet to industrial Internet : The wave of digital transformation is bringing about these changes
More exciting
Enter the following dialog box in the official account dialog box key word
See more quality content !
read | book | dried food | Make it clear | God operation | handy
big data | Cloud computing | database | Python | Reptiles | visualization
AI | Artificial intelligence | machine learning | Deep learning | NLP
5G | Zhongtai | User portrait | mathematics | Algorithm | Number twin
According to statistics ,99% The big coffee is concerned about the official account
Calculate the PI with the cons
p{margin:10px 0}.markdown-body