Words Excel Data sheet , Long period of division 、 A long time must be divided. .Excel Data table “ branch ” And “ close ” It is a common operation in daily office . Manual operation is not difficult , But when the amount of data is large , Repetitive operations are often disruptive . utilize Python Of Pandas library , It can be realized automatically Excel Data table “ Opening and closing ”. Here are some examples to share my practical code fragments .( If there is a better way , Welcome criticism and correction )
From the data platform ( Such as questionnaire platform ) The data exported in is often list type , Each line is a record , When there's a lot of data , Forms are often very “ Long ” Of . Sometimes it is necessary to use different values in a certain column , There will be a summary table “ branch ” Be alone Excel file .
def to_excelByColName(sourceDf,colName,outPath,excelName):
'''
The longitudinal “ branch ”: A worksheet “ branch ” For multiple Excel file
According to the different values in the specified column name , decompose Excel, And stored as multiple Excel file .
sourceDf: The original DataFrame
colName: Specifies the column name
outPath: The output path
excelName: file name , Add .xlsx suffix
'''
colNameList = sourceDf[colName].drop_duplicates().tolist()
for eachColName in colNameList:
sourceDf[sourceDf[colName]==eachColName].to_excel('/'.join([outPath,eachColName+excelName]),index=False)
for example : take 20 A class 1000 A summary of students , Divided into classes 20 individual Excel file .
call to_excelByColName
function , The effect is as follows :
to_excelByColName(sourceDf = sourceDf,colName=" class ",outPath=".\ Shift data sheet ",excelName=" Generate data table .xlsx")
def to_excelByColNameWithSheets(sourceDf,colName,outPath):
'''
The longitudinal “ branch ”: A worksheet “ branch ” Multiple for one file sheet
According to the different values in the specified column name , decompose Excel, And stored as a single file Excel Multiple files Sheet.
sourceDf: The original DataFrame
colName: Specifies the column name
outPath: The output path , Add .xlsx suffix
'''
writer = pd.ExcelWriter(outPath)
colNameList = sourceDf[colName].drop_duplicates().tolist()
for eachColName in colNameList:
sourceDf[sourceDf[colName]==eachColName].to_excel(writer,sheet_name=eachColName)
writer.save()
for example : take 20 A class 1000 A summary of students , Divided into classes 1 individual Excel Of documents 20 individual sheet surface .
call to_excelByColNameWithSheets
function , The effect is as follows :
to_excelByColNameWithSheets(sourceDf = sourceDf,colName=" class ",outPath=".\ Shift data sheet \ Generate data table .xlsx")
When processing data , Sometimes you need to add multiple auxiliary columns , This will also make the data table more and more “ wide ”. In the end, we only need some key columns , Then this involves horizontal data segmentation , Or extract some columns and keep them as a separate data table . Horizontal segmentation only needs to give DataFrame Just pass in the column name list .
for example : Just the name and class fields in the data table , It can be written like this .
df1 = sourceDf[[" full name "," class "]]
df1.to_excel(" A data sheet containing only names and classes .xlsx")
For data with the same structure , It can be spliced vertically during data processing , Easy to handle together .
def readExcelFilesByNames(fpath,fileNameList=[],header=0):
'''
The longitudinal “ close ”: Multiple Excel Merge files into one worksheet
Read the specified under the path Excel file , And merge into a total DataFrame.
Every Excel The data table format of the file shall be consistent .
1.fpath: Required , yes Excel File path , No filename
2.fileNameList: Need to read Excel List of filenames
3.header: Specify the number of rows to read
'''
outdf = pd.DataFrame()
for fileName in fileNameList:
tempdf =pd.read_excel('/'.join([fpath,fileName]),header = header)
outdf = pd.concat([outdf,tempdf])
return outdf
for example : take 20 A class Excel file , Merge into one data table
call readExcelFilesByNames
function , The effect is as follows :
fileNameList = [
" 6、 ... and 1 Shift data sheet .xlsx", " 6、 ... and 2 Shift data sheet .xlsx", " 6、 ... and 3 Shift data sheet .xlsx", " 6、 ... and 4 Shift data sheet .xlsx",
" 6、 ... and 5 Shift data sheet .xlsx", " 6、 ... and 6 Shift data sheet .xlsx", " 6、 ... and 7 Shift data sheet .xlsx", " 6、 ... and 8 Shift data sheet .xlsx",
" 6、 ... and 9 Shift data sheet .xlsx", " 6、 ... and 10 Shift data sheet .xlsx", " 6、 ... and 11 Shift data sheet .xlsx", " 6、 ... and 12 Shift data sheet .xlsx",
" 6、 ... and 13 Shift data sheet .xlsx", " 6、 ... and 14 Shift data sheet .xlsx", " 6、 ... and 15 Shift data sheet .xlsx", " 6、 ... and 16 Shift data sheet .xlsx",
" 6、 ... and 17 Shift data sheet .xlsx", " 6、 ... and 18 Shift data sheet .xlsx", " 6、 ... and 19 Shift data sheet .xlsx", " 6、 ... and 20 Shift data sheet .xlsx",
]
readExcelFilesByNames(fpath = ".\ Shift data sheet ",fileNameList=fileNameList)
def readExcelBySheetsNames(fpath,header = 0,prefixStr = "",sheetNameStr ="sheetName",prefixNumStr = "prefixNum"):
'''
The longitudinal “ close ”: Multiple Sheet Merge into one worksheet
Read all Excel Of documents sheet, And merge to return a total DataFrame.
Every sheet The format of the data table should be consistent .
1.fpath: Required , yes Excel Path to file , Add file name
2. Two new columns will be generated :sheetName and prefixNum, Convenient data processing
sheetName Columns are all sheet The name column of
prefixNum Columns are count columns
3.header: Specify the number of rows to read
'''
xl = pd.ExcelFile(fpath)
# obtain Excel All in the file sheet name
sheetNameList = xl.sheet_names
outfd = pd.DataFrame()
num = 0
for sheetName in sheetNameList:
num += 1
data = xl.parse(sheetName,header=header)
# produce sheet Name column and count column
data[sheetNameStr] = sheetName
data[prefixNumStr] = prefixStr +str(num)
# Data table splicing
outfd = pd.concat([outfd,data.dropna()])
xl.close()
return outfd
The following call readExcelBySheetsNames
, The operation effect is as follows :
readExcelBySheetsNames(fpath = ".\ Shift data sheet \ General data sheet .xlsx",sheetNameStr ="sheet name ",prefixNumStr = "sheet Serial number ")
For different Excel Horizontal merging between worksheets , Mainly based on some columns ( Such as : full name 、 ID number, etc ) A merger . stay pandas You can use merge
Method to implement , This is a very easy way to use , The opening speech is long , Follow up detailed sorting .
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
This article talks about Python Handle Excel The file mode is mainly based on pandas
Library , The main aim is List data table
. The list data table is described in detail in the following article :
https://www.cnblogs.com/wansq/p/15869594.html
Data table branch
It mainly involves file preservation ( write in ), For the program, it belongs to Output
link ;
Data table close
Mainly for file opening ( Read ), For the program, it belongs to Input
link .
When the above code is used to divide and combine a large number of repetitive tables , Great advantage ; But for the occasional 、 A small amount of opening and closing , Maybe it's faster to click with the mouse .
There is no good or bad technology , We need to use it flexibly !
Official account “ I don't know ”,
reply “ Opening and closing ” You can download the code of this article ,
Open the box !
Module import import openpyxl Read Excel file open Excel file workbook = openpyxl.load_workbook("test.xlsx") Output ...
This article focuses on windows Under the system Python3.5 The third party in the market excel Operation Library -openpyxl: Actually Python Third party libraries have a lot to work with Excel, Such as :xlrd,xlwt,xlwings Even annotated data ...
0. General treatment Read excel Format file :df = pd.read_excel('xx.xlsx'), Here are some functions that simply view the contents of a file : df.head(): Show the first five lines : df.columns: Exhibition ...
Here are Python Use in Pandas Read Excel Methods One . Software environment : OS:Win7 64 position Python 3.7 Two . Document preparation 1. Project structure : 2. Create a... In the current experiment folder Source Folder ...
What I'm talking about here pandas Not giant pandas , It is Python Third party library . What can this library do ? It's in Python The field of data analysis is unknown . Nobody knows . Can be said to be Python In the world Excel. pandas The library processes data ...
Python utilize pandas Handle Excel Application of data Recently, I've been fascinated by efficient data processing pandas, In fact, this is used for data analysis , If you're doing big data analysis and testing , So this is very useful !! But in fact, we usually do ...
Reference resources :https://www.cnblogs.com/liulinghua90/p/9935642.html One . Install third party libraries xlrd and pandas 1:pandas Dependency processing Excel Of xlrd modular , ...
Selenium2+python Automatic reading Excel data (xlrd) Reprinted address :http://www.cnblogs.com/lingzeng86/p/6793398.html ·········· ...
Pandas yes python A data analysis package of , It includes a large number of databases and some standard data models , Provides the tools needed to operate large datasets efficiently .Pandas Provides a large number of functions and methods that enable us to process data quickly and conveniently . Pandas Official documents ...
Python Pandas operation Excel Antecedents feed * Used in this chapter Python3.6 Pandas==0.25.3 You need to use excel Too many file fields for Taking into account the subsequent changes in field naming and Chinese / english / Japan ...
Rotate Array Rotate an array of n elements to the right by k steps. For example, with n = 7 and k = ...
eclipse-->window-->show view-->svn Tab to delete svn link , Right click to discard . 1.
Installation environment :Red Hat Linux 6.5_x64.oracle11g 64bit Error details : The installation to 68% The pop-up window reports an error : call makefile '/test/app/Administrators/p ...
ifconfig The command is used to set or view the network configuration , Include IP Address . Network mask . Broadcast address, etc . It is linux In the system , The most frequently used command on the Internet . 1. Command Introduction Command format : ifconfig [interfa ...
DIV2 1000pt The question : A group of people lined up , Each operation is decided by the dice , As long as no one wins , The game is not over . If you roll the dice 4, Then the first person in the queue wins the prize : otherwise , If you shake an odd number , Then the first person lines up to the end of the line : otherwise , The first person is out . if ...
return : Teacher he's teaching link [ project 5 - Arrays as data members ] Reading materials P255 example 8.4. Notice that the data members in the class can be arrays . Design a salary category (Salary), The data members of the class are as follows : class Salary ...
Summary of the game subject The question : Yes n A pyramid shaped ball of layers , If you choose a ball , You must take the two balls above it , Of course, you can take none of them . Find the maximum weight sum of the selected ball . Answer key : Turn this pile of balls into , The first line is (0,0), The second is (1,0) ...
reminder First install postfix perhaps sendmail Wait for the mail server 1.Apache #!/bin/bash #apache.sh nc -w2 localhost 80 if[ $? -ne ...
Model : lenovo Y480 Existing operating system :win8.1 64 position Physical memory :8G Disk storage : Two hard drives 1. SSD Solid state disk Solid State Drives 110G Now... Is installed win8 System 2.H ...
Interesting questions Why do you think time flies faster when you grow up ? See also , The discussion was quite profound Why do people feel that time passes faster and faster with age ? When I was a child , You would spend ten minutes watching an ant . When I was a child , You will find it strange to meet a bird on your walk . ...