Hi. , Hello, everyone , This is the demon king ~
Pycharm Try to keep the version consistent ~
requests >>> pip install requests
parsel >>> pip install parsel
win + R Input cmd Click ok , Enter the installation command pip install Module name (pip install requests) enter
stay pycharm Click on the Terminal( terminal ) Enter the installation command
resolvent : Set the environment variable
resolvent : Because the network link timed out , You need to switch the mirror source
for example :pip3 install -i https://pypi.doubanio.com/simple/ Module name
resolvent : Multiple... May be installed python edition (anaconda perhaps python Just install one ) Just uninstall one
Or you pycharm Inside python The interpreter is not set
Click on the gear , choice add
add to python The installation path
choice file( file ) >>> setting( Set up ) >>> Plugins( plug-in unit )
Click on Marketplace Enter the name of the plug-in you want to install such as : Translation plug-ins Input translation / Chinese plug-in Input Chinese
Select the corresponding plug-in and click install( install ) that will do
After successful installation Yes, it will pop up restart pycharm The option to Click ok , Restart to take effect
Clear requirements
Through developer tools for packet capture analysis , analysis manhua Where does the data content come from a sheet manhua picture <url Address > ----> Get all of this chapter manhua Where does the content come from
Send a request , For the image data packet just analyzed url Address send request
get data , Get the response data returned by the server response
Parsing data , Extract all manhau picture url Address
Save the data , hold manhua Save contents to local folder
Collect multiple chapters manhua Content —> To find more manhau Data packets url Address —> Analysis request url Address parameter change —> chapter ID change
Just get all manhua chapter ID That's all right. —> All directory pages List page To analyze and find
get data , Get the response data returned by the server response
Parsing data , Extract all manhua chapter ID as well as manhua title
Due to the audit mechanism , I deleted some things from the website , Xiao Kenai can add it by themselves , It's easy
There are two more words , I used Pinyin instead of , You can change back to the text ~
If there is a little lazy or not able to change, Xiao Kenai can also confide in me , I sent you ~
( Or view and click on the homepage ( article ) The mobile text on the left is free ~( You may need to row down ))
# Import data request module import requests# Import format output module import pprint# Import data analysis module import parsel# Import file operation module import os
link = ''
# headers Request header camouflage headers = { # user-agent: The user agent Represents the basic identity of the browser 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',}
response_1 = requests.get(url=link, headers=headers)
# print(response_1.text)# Parsing data What is it like to get data , Choose the most appropriate parsing method selector = parsel.Selector(response_1.text)lis = selector.css('.chapter__list-box .j-chapter-item')
name = selector.css('.de-info__box .comic-title::text').get()
filename = f'{name}\\'if not os.path.exists(filename): os.mkdir(filename)for li in list(reversed(lis)): chapter_id = li.css('a::attr(data-chapterid)').get() chapter_title = li.css('a::text').getall()[-1].strip() print(chapter_id, chapter_title)
Send a request , Simulate browser for url Address send request
What follows the question mark , All belong to this url Request parameters for , You can use the dictionary alone to accept
use python Code simulation browser , It is necessary to use headers Request header —> You can copy and paste in the developer tool
user-agent: The user agent Represents the basic identity of the browser
How to quickly replace in batches :
Select the content to replace ctrl + R Enter the regular expression command , Click Replace All
(.*?): (.*) '$1': '$2',
—> Copy and paste
# https://comic..com/chapter/content/v1/?chapter_id=996914&comic_id=211471&format=1&quality=1&sign=c2f14c1bdb0505254416907f504b4e03&type=1&uid=55123713 url = ''
—> Copy and paste
data = { 'chapter_id': chapter_id, 'comic_id': '211471', 'format': '1', 'quality': '1', 'sign': 'c2f14c1bdb0505254416907f504b4e03', 'type': '1', 'uid': '55123713', }
To disguise python Code —> Copy and paste
headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36' }
response = requests.get(url=url, params=data, headers=headers) # <Response [200]> The response object , 200 Status code Indicates that the request was successful print(response) # get data , Get the response data returned by the server ```python # response.text Get text data < data type : character string > response.json() obtain json Dictionary data < data type : Dictionaries > print(response.json())
— > What is it like to get data , Choose the most appropriate parsing method Dictionary values , Extract data contents according to key value pairs
According to the content to the left of the colon [ key ], Extract the content to the right of the colon [ value ] —> Key value pair value Key value pairs are separated by commas
image_list = response.json()['data']['page'] # list num = 1 for image in image_list: # You can put the list < A box for things > The elements inside , One by one img_url =image['image'] print(img_url)
—> It is also necessary to correct the picture url Address send request , And get its data content response.content Get binary data
img_content = requests.get(url=img_url, headers=headers).content # Save the data , Save the picture shipin Audio Specific format files <zip ppt..> Get binary data content # mode Mode saving method wb w write in b Binary system wb Write in binary mode with open(filename + chapter_title + str(num) + '.jpg', mode='wb') as f: f.write(img_content) num += 1
There is no fast track to success , There is no highway to happiness .
All the successes , All come from tireless efforts and running , All happiness comes from ordinary struggle and persistence
—— Inspirational quotes
This article is finished ~ Anyone who is interested can try
Your support is my biggest motivation !! Remember Sanlian ~ Welcome to read previous articles ~
author : The devil will not cry
Game programming , A game development favorite ~
If the picture is not displayed for a long time , Please use Chrome Kernel browser .