程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Office efficiency has taken off! Python has finally freed my hands

編輯:Python
Recently, I received a paid Q & A in Zhihu , Although the paid Q & a function has been opened , But I haven't answered the questioner's question for a long time .
Due to limited time and energy , I can't spare the whole time to answer the questioner's questions , And don't want to fool the students who ask questions in a few words , Just don't answer .
however , A few days ago, a classmate paid me to consult ” How to use Python hold 3 individual PDF The files are merged together in cross order ?“
You bet ,PDF、Word As a document format often encountered in work and study , It's hard to avoid dealing with them , today , Let's introduce how to use Python Complete common PDF、Word Editing function , You don't have to pay for this simple thing anymore !

PDF file

PDF Is a portable file format , It contains text 、 Images 、 Charts, etc. .
Unlike plain text files , It is a method that contains ".pdf" File with extension , from Adobe The company invented .
This type of file is compatible with any platform such as software 、 Hardware has nothing to do with the operating system .
Installation kit
You need to install one called pypdf2 Software package , It can handle files with extensions ".pdf " The file of :
pip install pypdf2

Installation successful , You'll see the following :
Read PDF File and extract data
We can only pdf Extract text content from file , because PyPDF2 There is a limitation when extracting multimedia content ,logo、 Pictures, etc. cannot be extracted from .

In the code above import The statement gets PyPDF2 modular . You need to use open('pdfFileName' , 'openMode'), among pdfFilename It's the name of the file ,openMode yes rb, That is, only binary formats are read .

PyPDF2 There is one named  "PdfFileReader" Methods , It receives newly created objects  "pdfFileObject". You can start from now on  "pdfFileObject " The access name in is  "numPages" Properties of , It can return the total number of pages .

You can use pdfReaderObject Inside 'getPage(0)' Method to get the 1 page . Then store the results in 'firstPageObject' in , By using 'extractText()' Method can print out all the text in the specific page .


The code above gives pdf All text of the file . however , The image is not displayed on the terminal , This is useful pyPDF2 Yes, we can't get it .
Merge PDF
You will put two different pdf Merge the files into one pdf file , First you need to get 2 One for testing PDF file .

We need to go from PyPDF2 Import PdfFileMerger modular , It can be used to merge pdf file .
Appoint 'path', It indicates the path of the folder where the file is located . in addition , To merge pdf The file is contained in 'pdf_files' List of .

First , Need to pass through PdfFileMerger Create a merge object , Then the traversal of each file in the list , The merger is through 'append' Method to pass the path and file .
Last , By using 'merger.write()' You can get the final output , Here you can get the merged content and new PDF file name .

The image above shows a 'merged.pdf', It consists of 'test.pdf' and 'test-1.pdf' Merge the contents of .

Word file

Word The file is defined by the... At the end of the file name ".docx " The extension consists of . These files do not contain only text like plain text files , It includes rich text files . Rich text files contain different structures of files , These structures have sizes 、 alignment 、 Color 、 picture 、 Font, etc. .
If you have one for processing Word Document application , That would be the best . Apply to Windows and Mac The popular application of the operating system is Microsoft Word, But it's a paid subscription Software .
Of course , There is also a free alternative , Such as  "LibreOffice", It is a pre installed in Linux Applications in . These applications can be found in Windows and Mac Download from the operating system .
this paper , How to pass Python Free operation Word file .
Installation kit
You need to install one called  "python-docx" Software package , It can handle files with extensions ".docx " Of word file .
edit Word file

You can see in the first line above  "document" The module is from  "docx " Imported from package .
The second line of code passes Document Object generates a new word file .
Use 'document.save()', The file name is saved as 'first.docx'.
Add the title
The above code contains a Document() Open a new file ,document.save('addHeader.docx') Used to create a new edit docx file .

You can go through add_heading('text,' level=number) Method to add a title , This method takes the text as the title , The title level is from 0 To 4 Start .

The output given by the above code is a newly created 'addedHeader.docx' file , among 0 The title of the level is the horizontal line below the text , and 1 The title of the level is the main title .
similarly , Other titles are subtitles , The font size decreases in turn .
Add paragraph

The above code contains a Document(), It opens a new document file ,document.save('addParagraph.docx') Used to create a new edit docx file . You can go through add_paragraph('text,' style='required_style') Method to add a title , This method receives text , meanwhile style Is an optional parameter , have access to 'List Number' and 'List Bullet'.

The output given by the above code is a newly created addedParagraph.docx file , There is a simple paragraph on the first line .
Again , There is a title , Below it is an ordered list , Contains a number 1 and 2 Project .
Add images

The above code contains a Document(), It creates a new document file ,document.save('addPicture.docx') Used to create a new edit docx file .
You can use add_picture() To add pictures , The first parameter it contains is cat-1.jpeg Is the path of the picture of the cat .

Width and height are optional parameters , The default is 72 dp, But we used... For our purposes Inches.
The output given by the above code is a newly created addedPicture.docx file , It contains an image of a cat , The width and height of the image are 1.25 Inch .
Read Word file
Next , We use Python Read a word file .

The first line of code starts with docx Import in module Document, Used to transfer the required document files , And create an object .obtainText It's a function , Receive the file fullText.docx. The loop is for each paragraph , These paragraphs are composed of document.parages visit , And use append Method is inserted into an empty list .
Last , This function returns a value in ” Another line “ List of ending paragraphs .

The output above shows that there are no styles 、 Plain text in color .
Next , You can free your hands , Use Python Done automatically PDF、Word Document operation !
  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved