Now a lot of data comes from mobile terminals app, A lot of data acquisition is also very useful after processing , This time, I will take the pictures of heroes in the recent hot King glory , Download to local .
Environmental Science :windows/linux
Language :python
edition :3.7
modular / frame :scrapy,os
1. Use the bag grabbing tool Fidder Yes, cell phones app Grab the data , As for the Fidder How to configure and use , There are a lot of information on the Internet for everyone to use .
2. View from the packet capture tool url
3. Get page code
4. Separate data
5. Get picture information and save
1. Create a project
scrapy startproject King_Fight
The creation path is not described here , It's under the folder I created in advance , open powershell Medium operated
And then again spiders Create a new... In the file .py The file is named spider
2. Open grab tool
Pay attention to mobile phones and PC Must be under the same network segment , Then open the Fidder, Then open the app Hero interface in ,Fidder When configured correctly , You will see the corresponding data refresh , Then click the corresponding information , see URL, I got it here URL yes ‘
start_urls = ['http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=8025FD949C93FC66D1DDB6BAC65203D7&ovr=8.0.0&device=Xiaomi_MI+6&net_type=1&client_id=&info_ms=&info_ma=xA9SDhIYZnQ7DOL9HYU%2FDTmfXcpNZC9piF6I%2BbRM5q4%3D&mno=0&info_la=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&info_ci=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=9XEEdN1xCIRfdgHQ8NQ4DlZl%2By%2BL8gXiWPRLzJYCKss%3D&os_level=26&os_id=0d62e3f861713d92&resolution=1080_1920&dpi=480&client_ip=192.168.1.61&pdunid=bbbb5488']
’
Readers can also use my interface directly .
3. stay item.py Write in the data to be obtained
import scrapy
class WangzSpiderItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
4. Write a crawler to get the page content
import scrapy
from scrapy import Request
import json
import os
from WangZ_Spider.items import WangzSpiderItem
class SpiderSpider(scrapy.Spider):
name = 'spider'
#allowed_domains = ['wanz.com']
start_urls = ['http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=8025FD949C93FC66D1DDB6BAC65203D7&ovr=8.0.0&device=Xiaomi_MI+6&net_type=1&client_id=&info_ms=&info_ma=xA9SDhIYZnQ7DOL9HYU%2FDTmfXcpNZC9piF6I%2BbRM5q4%3D&mno=0&info_la=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&info_ci=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=9XEEdN1xCIRfdgHQ8NQ4DlZl%2By%2BL8gXiWPRLzJYCKss%3D&os_level=26&os_id=0d62e3f861713d92&resolution=1080_1920&dpi=480&client_ip=192.168.1.61&pdunid=bbbb5488']
item = WangzSpiderItem()
headers = {
'Accept - Charset': 'UTF-8;',
'Accept - Encoding': 'gzip, deflate',
'Content - type' :'application/ x-www-form-urlencoded',
'X-Requested-With': 'XMLHttpRequest',
'User - Agent': 'Dalvik/2.1.0(Linux;U;Android 8.0.0;MI 6 MIUI/V10.0.2.0.OCACNFH)',
'Host': 'gamehelper.gm825.com',
'Connection': 'Keep - Alive',
}
def parse(self, response):
yield Request(url=self.start_urls[0],headers=self.headers,method='GET',callback=self.get_data)
def get_data(self,response):
print(response.text)
Don't do it yet , Have done scrapy All of us know that you can type directly on the command line scrapy crawl spider Can run , This time we can run it by clicking
4. establish start.py
The file is created in the same level directory as the project file , If you're wrong, you can't play , Then write a line of code in the file
from scrapy import cmdline
cmdline.execute('scrapy crawl spider'.split())
Then click on it Run->edit->+->pthon-> Open the corresponding start file .
Then click the green arrow in the upper right corner , And he began to run , This is when you should be able to see the output of the page str data
5. Collating data
hold str The data is converted into json And turn it into a dictionary , In this way, it is very convenient for us to obtain , Loading
def get_data(self,response):
print(response.text)
result = json.loads(response.text)
print(result)
result = dict(result)
print(result)
result = result["list"]
print(len(result))
ids = [result[id]["cover"] for id in range(0,len(result))]
names = [result[id]["name"] for id in range(0,len(result))]
hero_ids = [result[id]['hero_id'] for id in range(0,len(result))]
print(ids)
print(names)
print(hero_ids)
self.item['image_urls'] = ids
self.item['images'] = names
yield self.item
So the reptile part is over .
6, Images are downloaded
After the data is separated , What I see is a pile of http://...........png Things that are , In fact, it has been successful , This is the picture , The next thing to do is to download them .
First, you need to set settings.py file
ITEM_PIPELINES = {
'WangZ_Spider.pipelines.WangzSpiderPipeline': 300,
}
IMAGE_STORE = 'E:/python_project/King_Fight/WangZ_Spider/Image'
IMAGE_URLS_FILE = 'image_urls'
IMAGE_RESULT_FILED = 'images'
IMAGE_THUMBS = {
'small':(80,80),
'big':(240,240),
}
Then write pipline.py Download the pictures
import requests
from .settings import IMAGE_STORE
import os
class WangzSpiderPipeline(object):
def process_item(self, item, spider):
images = []
dir_path = '{}'.format(IMAGE_STORE)
if not os.path.exists(dir_path) and len(item['src']) != 0:
os.mkdir(dir_path)
for jpg_url, name, num in zip(item['image_urls'], item['images'], range(0, 100)):
file_name = name + str(num)
file_path = '{}//{}'.format(dir_path, file_name)
images.append(file_path)
if os.path.exists(file_path) or os.path.exists(file_name):
continue
with open('{}//{}.png'.format(dir_path, file_name), 'wb') as f:
req = requests.get(url=jpg_url)
f.write(req.content)
return item
Next, the results show