Recently, I found that many friends around me are not happy to use pandas, Switch to other data operation Libraries , As a data worker , Basically open your mouth pandas, Closed mouth pandas 了 , So I wrote this series to make more friends fall in love with pandas.
Series article description :
Series name ( Serial number of series articles )—— This series of articles specifically address the needs
platform :
Recently I was reading a book about using pandas A book for data processing , On 2020 Published in , There is a section dealing with the sales date of online retail goods , Get the corresponding month beginning date in the date column . The data is read as follows :
import pandas as pd
# According to the code in the book , take InvoiceDate Resolve to date type
df = pd.read_csv('Online_Retail.csv.zip', parse_dates=['InvoiceDate'])
df = df.dropna().copy()
ps: Data acquisition method :
github:
https://github.com/lk-itween/FunnyCodeRepository/raw/main/PandasSaved/data/Online_Retail.csv.zip
(406829, 9)
as everyone knows , In the current calendar, the beginning of each month is the beginning of the month 1 Number , There are also many ways to obtain , This article lists one or two .
The code given in the example in the book is to separate the month, year and day , then 1 No. is spliced into new date data .
def get_month_start(x):
return datetime(x.year, x.month, 1)
df['MonthStart'] = df['InvoiceDate'].map(get_month_start)
pandas
There are also functions that handle time variables in , You don't need to write your own logic to get the beginning and end dates of the month . But you should pay attention to , The following are some cases and corresponding solutions during the demonstration .
from pandas.tseries.offsets import MonthBegin, MonthEnd
# Construct demo samples
df2 = pd.to_datetime(['2022-9-1', '2022-9-2', '2022-9-29', '2022-9-30',
'2022-10-1', '2022-10-2', '2022-10-30', '2022-10-31']).to_frame(name='date')
Set the interval parameter n
Set to 0, That is to obtain the beginning and end date of the current month , It can be clearly seen in the figure that only when it is the beginning of the month, the beginning of the month can be correctly obtained , The remaining dates will be obtained as the beginning of the next month , The month end date can be obtained correctly .
Then set the interval parameter n
Set to 1, Get next month's date , The effect is as follows :
At this point, the month beginning function can correctly obtain the month beginning date of the next month , The month end function can correctly obtain the end date of the next month only when the date is month end .
How to correctly obtain the date of the above error condition , Those that have been correctly obtained will not be repeated , Can be learned that , It is right to obtain the situation at the beginning of next month and the function at the end of this month , The correct result can be converted into the correct target value after adding and subtracting once .
At the beginning of this month :
End of next month :
df[‘InvoiceDate’] The date data in contains time , The time will not be deleted at the beginning and end of the month , Use .dt.floor('D')
Capture the date and then get .
df['InvoiceDate'].dt.floor('D') + MonthBegin() - MonthBegin()
The time required to convert using this method is very little compared with the method given in the book .
( Manual watermark : original CSDN The fate of the sleepers ,https://blog.csdn.net/weixin_46281427?spm=1011.2124.3001.5343 , official account A11Dot send )
stay pandas Periodic date data can be generated in , Of course, you can convert dates into cycles , What this chapter needs to obtain is the beginning and end date of the month , You need to convert the date to the cycle data in month .
df3 = pd.period_range('2021-10', '2022-05', freq='M').to_frame(name='date')
A set of date data with month as cycle is generated . For data that is already a date type, you can use .dt.to_period
Method to convert .
Periodic data is compared with the date dt
Method , More start_time
and end_time
, Get the month beginning date and month end date of the current date respectively .
because end_time
It will directly return the last time in milliseconds , Need to use floor
Intercept date .
You can also use dt.asfreq
Get the beginning and end date of the month , Take two parameters :
freq : str # A frequency parameter , Such as A Represents the year ,M Representative month ,D On behalf of the day
how : str {
'E', 'S'}
# Last : 'E', 'END', or 'FINISH' for end,
# Start : 'S', 'START', or 'BEGIN' for start.
You need to convert the monthly cycle to the daily cycle , The result is the beginning of the month , You can set it like this :
df3['date'].dt.asfreq('D', how='S')
To return to the end of the month , You can set it like this , Parameter name is not necessary , Default end date :
df3['date'].dt.asfreq('D', 'E')
Compare the two time-consuming situations in the sample data .
asfreq
It seems to be better than start_time
It will take less time , At the same time, note that the result types after conversion are different , Two dt Some properties of the method 、 The method is different , If you need to convert a period type to a date type , Can be asfreq
Change to to_timestamp
, Parameters are consistent , It takes a little longer , Results and start_time
similar .
notes :
The data used in this article is of date type , If the date is a string Series
type , It can be done by pd.to_datetime(s, format), take format Set the corresponding format parameters to date type, and then test several methods mentioned in the article .
The source data can be obtained at the beginning of the article .
This article introduces examples , Explain separately pandas
There are several ways to get the beginning and end date of a month , Obviously there are other ways to get ,pandas
The operation of processing the date into vectorization , Compared to initialization datetime
Type data , The method is simple 、 Efficient , It was also mentioned in the previous articles of the series , Vectorization is more common than parameter definition , The execution efficiency of initialization should be high .
The sudden rain and strong wind make people fall down , God must be impressed .
Made on June 24, 2002
Full project download link : D
This article was first publish