Hello everyone , This is Wang Feng, a programmer .
Many friends use Python Medium Pandas This library carries on Excel Data processing of
, Data processing can be divided into such macroscopically 3 Stages : data fetch 、 Data processing 、 Data output .
For most newcomers , At this step of data reading, it gets stuck .
Today we will learn together ,Pandas Officially recommended 6 Kind of Excel Read mode .
This article altogether 3 part : download pandas And generation Excel file 、 Source code interpretation 、 Read Excel Of 6 Ways of planting .
If you are a skilled Python Users , You can jump directly to 3 part .
If you're new Python Or just touched Pandas, I suggest you start from 1 Part begins to see .
All codes below , Fine ← about → Slide to see , You can also copy and paste directly .
Pandas
library . So you can use Pandas, This is not hard to understand ?Excel
file . In order to ensure that everyone and the operation of this article are unified , I suggest you use the same Excel file . How to download Pandas? How to get Excel? We all use 1 Line the command to do it automatically
, After all, we are an automated office community , If these operations cannot be automated , Isn't that too much ?
You can directly execute the following line of code , It will generate an exactly the same as this article Excel Documents ~
In your computer terminal , Execute the following command , You can install it automatically pandas 了 ~
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-office -U
Excel You don't need to download files everywhere , We introduced a function before , This is where it can be used :1 Line code , Automatically generate with analog data Excel file
We'll deal with it later Excel Cases and demonstrations , This automatic generation method is used every time , Of course, you can also edit one manually , But if we want to learn how to deal with it in the future 10w Yes Excel What about the documents ?
Whether you are manually generated or Baidu cloud download , It is an extremely slow process .
But use the following generation method , Simulate one 10w+ Data Excel file , It's just a moment , Be sure to try it ~ You will find a new world .
import office
office.excel.fake2excel(columns=['name', 'company_prefix','job'], rows=5)
In your PyCharm Inside , Execute the above line of code , You can generate one as shown in the following figure , Exactly the same as this article Excel Documents ~
In fact, learning Pandas It's simple , Don't rush around online , All code functions , The founders and developers have written in the source code through annotation .
How to find pandas Source code ?
Download it pandas in the future , We'll open it pandas Source code , have a look pandas What are the recommended reading methods .pandas Source path :D:\ Yours python The installation directory \Lib\site-packages\pandas\
After opening the source code ,pandas There are multiple directory structures under the folder , As shown in the figure below , We need to read Excel function , stay pandas\io\excel\_base.py
In the document 290 That's ok -350 That's ok
. As shown in the figure below
Now that you have found the source code , So here comes the problem What code tells us ?
Now we will get from the above pandas Source code , Analyze this one by one 6 There are two kinds of reading excel The way .
This way of reading , fit Excel The data in , There is a column indicating the serial number .
pd.read_excel('fake2excel.xlsx', index_col=0)
# Use index_col=0, Designate the 1 Column as index column .
The results are shown in the following figure :
This method does not meet the requirements of our document , So we can make the following modifications : Do not specify index columns .
The code and results are as follows :
pd.read_excel('fake2excel.xlsx', index_col=None)
See the name and know the meaning .
pd.read_excel(open('fake2excel.xlsx', 'rb'), sheet_name='Sheet2')
# Use sheet_name=0, Specify read sheet2 What's in it .
We added sheet2, The results are shown in the following figure :
Read data without column name .
pd.read_excel('fake2excel.xlsx', index_col=None, header=None)
# Use header=None, Cancel header Read .
The results are shown in the following figure :
This is suitable for high-end players , In the case of high precision or fast speed requirements for data processing .
pd.read_excel('fake2excel.xlsx', index_col=0, dtype={'age': float})
# Use dtype, Specify the data type of a column .
The results are shown in the following figure :
What is the use scenario ? For example, when collecting information, according to the time , It is found that someone filled in a negative age , Then automatically clear his age , Ask him to fill in again .
pd.read_excel('fake2excel.xlsx', index_col=None,na_values={'name':" Pang Qiang "})
# Use na_values, Define the data that is not displayed
The results are shown in the following figure :
Not only Python You can write notes ,Excel You can also write notes . Many people haven't used , Used friends in the comment area say why you give Excel Write notes ~?
pandas Provides handling Excel Comment line method .
pd.read_excel('fake2excel.xlsx', index_col=None, comment='#')
The results are shown in the following figure :
As Python The programmer , You need to read the source code at ordinary times , Recognize the principle and logic behind the code .
Recently used pandas More , Just right pandas Can also be handled excel, So it will be updated continuously in the near future pandas Articles used .
What do you want to see next , Let me know in the comments section