您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python crawls mobile app data

編輯：Python

Preface

Now a lot of data comes from mobile terminals app, A lot of data acquisition is also very useful after processing , This time, I will take the pictures of heroes in the recent hot King glory , Download to local .

Technical preparation

Environmental Science ：windows/linux

Language ：python

edition ：3.7

modular / frame ：scrapy,os

technological process ：

1. Use the bag grabbing tool Fidder Yes, cell phones app Grab the data , As for the Fidder How to configure and use , There are a lot of information on the Internet for everyone to use .

2. View from the packet capture tool url

3. Get page code

4. Separate data

5. Get picture information and save

Implementation code

1. Create a project

scrapy startproject King_Fight

The creation path is not described here , It's under the folder I created in advance , open powershell Medium operated

And then again spiders Create a new... In the file .py The file is named spider

2. Open grab tool

Pay attention to mobile phones and PC Must be under the same network segment , Then open the Fidder, Then open the app Hero interface in ,Fidder When configured correctly , You will see the corresponding data refresh , Then click the corresponding information , see URL, I got it here URL yes ‘

start_urls = ['http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=8025FD949C93FC66D1DDB6BAC65203D7&ovr=8.0.0&device=Xiaomi_MI+6&net_type=1&client_id=&info_ms=&info_ma=xA9SDhIYZnQ7DOL9HYU%2FDTmfXcpNZC9piF6I%2BbRM5q4%3D&mno=0&info_la=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&info_ci=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=9XEEdN1xCIRfdgHQ8NQ4DlZl%2By%2BL8gXiWPRLzJYCKss%3D&os_level=26&os_id=0d62e3f861713d92&resolution=1080_1920&dpi=480&client_ip=192.168.1.61&pdunid=bbbb5488']

’

Readers can also use my interface directly .

3. stay item.py Write in the data to be obtained

import scrapy
class WangzSpiderItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()

4. Write a crawler to get the page content

import scrapy
from scrapy import Request
import json
import os
from WangZ_Spider.items import WangzSpiderItem
class SpiderSpider(scrapy.Spider):
name = 'spider'
#allowed_domains = ['wanz.com']
start_urls = ['http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=8025FD949C93FC66D1DDB6BAC65203D7&ovr=8.0.0&device=Xiaomi_MI+6&net_type=1&client_id=&info_ms=&info_ma=xA9SDhIYZnQ7DOL9HYU%2FDTmfXcpNZC9piF6I%2BbRM5q4%3D&mno=0&info_la=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&info_ci=jUm4EMrshA%2BjgQriNYPOaw%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=9XEEdN1xCIRfdgHQ8NQ4DlZl%2By%2BL8gXiWPRLzJYCKss%3D&os_level=26&os_id=0d62e3f861713d92&resolution=1080_1920&dpi=480&client_ip=192.168.1.61&pdunid=bbbb5488']
item = WangzSpiderItem()
headers = {
'Accept - Charset': 'UTF-8;',
'Accept - Encoding': 'gzip, deflate',
'Content - type' :'application/ x-www-form-urlencoded',
'X-Requested-With': 'XMLHttpRequest',
'User - Agent': 'Dalvik/2.1.0(Linux;U;Android 8.0.0;MI 6 MIUI/V10.0.2.0.OCACNFH)',
'Host': 'gamehelper.gm825.com',
'Connection': 'Keep - Alive',
}
def parse(self, response):
yield Request(url=self.start_urls[0],headers=self.headers,method='GET',callback=self.get_data)
def get_data(self,response):
print(response.text)

Don't do it yet , Have done scrapy All of us know that you can type directly on the command line scrapy crawl spider Can run , This time we can run it by clicking

4. establish start.py

The file is created in the same level directory as the project file , If you're wrong, you can't play , Then write a line of code in the file

from scrapy import cmdline
cmdline.execute('scrapy crawl spider'.split())

Then click on it Run->edit->+->pthon-> Open the corresponding start file .

Then click the green arrow in the upper right corner , And he began to run , This is when you should be able to see the output of the page str data

5. Collating data

hold str The data is converted into json And turn it into a dictionary , In this way, it is very convenient for us to obtain , Loading

 def get_data(self,response):
print(response.text)
result = json.loads(response.text)
print(result)
result = dict(result)
print(result)
result = result["list"]
print(len(result))
ids = [result[id]["cover"] for id in range(0,len(result))]
names = [result[id]["name"] for id in range(0,len(result))]
hero_ids = [result[id]['hero_id'] for id in range(0,len(result))]
print(ids)
print(names)
print(hero_ids)
self.item['image_urls'] = ids
self.item['images'] = names
yield self.item

So the reptile part is over .

6, Images are downloaded

After the data is separated , What I see is a pile of http://...........png Things that are , In fact, it has been successful , This is the picture , The next thing to do is to download them .

First, you need to set settings.py file

ITEM_PIPELINES = {
'WangZ_Spider.pipelines.WangzSpiderPipeline': 300,
}
IMAGE_STORE = 'E:/python_project/King_Fight/WangZ_Spider/Image'
IMAGE_URLS_FILE = 'image_urls'
IMAGE_RESULT_FILED = 'images'
IMAGE_THUMBS = {
'small':(80,80),
'big':(240,240),
}

Then write pipline.py Download the pictures

import requests
from .settings import IMAGE_STORE
import os
class WangzSpiderPipeline(object):
def process_item(self, item, spider):
images = []
dir_path = '{}'.format(IMAGE_STORE)
if not os.path.exists(dir_path) and len(item['src']) != 0:
os.mkdir(dir_path)
for jpg_url, name, num in zip(item['image_urls'], item['images'], range(0, 100)):
file_name = name + str(num)
file_path = '{}//{}'.format(dir_path, file_name)
images.append(file_path)
if os.path.exists(file_path) or os.path.exists(file_name):
continue
with open('{}//{}.png'.format(dir_path, file_name), 'wb') as f:
req = requests.get(url=jpg_url)
f.write(req.content)
return item

Next, the results show