You need to crawl when you are a reptile recently pdf The paging files are merged into one pdf file , Just thinking about python Is there any library that can implement . Through a simple search , Found out Pypdf2.
The installation procedure is simple :
pip install pypdf2
Find... By querying the document pypdf2 It provides a lot of , Only two of them are needed to meet my needs , namely PdfFileMerger
, PdfFileReader
,
from PyPDF2 import PdfFileMerger, PdfFileReader # introduce
file_merger = PdfFileMerger(strict=False) # Initialize and set non strict checks
target_path = 'X:/XXX/temp' # Merge pdf In the directory
path='X:/XXX/XXX' # Output directory after merging
pdf_lst = [f for f in os.listdir(target_path) if f.endswith('.pdf')]# Read pdf
pdf_lst = [os.path.join(target_path, filename) for filename in pdf_lst]# Complete the file address
for pdf in pdf_lst:
file_merger.append(PdfFileReader(pdf), 'tag')
file_merger.addMetadata(
{
u'/Title': u'my title', u'/Creator': u'creator', '/Subject': 'subjects'})# completion pdf Information
with open(PATH, 'wb+') as fa:
file_merger.write(fa) # Write the merged pdf
Through the above code, you can achieve pdf The merger of .
PdfFileMerger()
There is a parameter ,strict(bool) – Determine whether the user should be warned of a problem , The default value is True
. I merged pdf A problem was detected in , Not set to strict=False
Can't merge successfully .
PdfFileMerger
Of append()
Methods also have parameters , The first parameter is the file object , The second parameter is bookmark , In order to facilitate browsing, I crawled the bookmark with a crawler , The second parameter is used to set bookmarks in the merge .
PdfFileMerger
Of addMetadata()
Method can add metadata , Easy pdf Follow up management . The parameters of this method must be set using tuples .
Official documents :PyPDF2 Documentation — PyPDF2 1.26.0 documentation (pythonhosted.org)