1、 Dynamic packet capture demo
2、json Data analysis
3、requests Use of modules
4、 preservation csv
Installation command :requests >>> pip install requests
Through developer tools for packet capture analysis , Analyze the data you want Where can I get
Analyze the data From the second page
Open developer tools
Click on the second page
Click the search button , Search for content
View the contents of the response data returned by the server
Use code to simulate the browser to send a request to obtain data
1. The import module
import requests # Data request module
import pprint # Format output module
import csv # Built-in module
import time
import re
def get_shop_info(html_url):
# url = 'https://www.meituan.com/xiuxianyule/193306807/'
headers = {
'Cookie': '_lxsdk_cuid=17e102d3914c8-000093bbbb0ed8-4303066-1fa400-17e102d3914c8; __mta=48537241.1640948906361.1640948906361.1640948906361.1; _hc.v=e83bebb5-d6ee-d90e-dd4b-4f2124f8f982.1640951715; ci=70; rvct=70; mt_c_token=2Tmbj8_Qihel3QR9oEXS4nEpnncAAAAABBEAAB9N2m2JXSE0N6xtRrgG6ikfQZQ3NBdwyQdV9vglW8XGMaIt38Lnu1_89Kzd0vMKEQ; iuuid=3C2110909379198F1809F560B5E33A58B83485173D8286ECD2C7F8AFFCC724B4; isid=2Tmbj8_Qihel3QR9oEXS4nEpnncAAAAABBEAAB9N2m2JXSE0N6xtRrgG6ikfQZQ3NBdwyQdV9vglW8XGMaIt38Lnu1_89Kzd0vMKEQ; logintype=normal; cityname=%E9%95%BF%E6%B2%99; _lxsdk=3C2110909379198F1809F560B5E33A58B83485173D8286ECD2C7F8AFFCC724B4; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; latlng=28.302546%2C112.868692; ci3=70; uuid=f7c4d3664ab34f13ad7f.1650110501.1.0.0; mtcdn=K; lt=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA; u=266252179; n=qSP946594369; token2=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA; unc=qSP946594369; firstTime=1650118043342; _lxsdk_s=18032a80c4c-4d4-d30-e8f%7C%7C129',
'Referer': 'https://chs.meituan.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36'
}
response = requests.get(url=html_url, headers=headers)
# print(response.text)
phone = re.findall('"phone":"(.*?)"', response.text)[0]
# \n It's not a newline , \n Just symbols \ The escape character is transferred
openTime = re.findall('"openTime":"(.*?)"', response.text)[0].replace('\\n', '')
address = re.findall('"address":"(.*?)"', response.text)[0]
shop_info = [phone, openTime, address]
return shop_info
# Save the file Create folder encoding='utf-8' Specified encoding If I use utf-8 What if you mess with the code
# w Will be covered , a Will not cover
f = open(' The ultimate version of the invincible man's Secret .csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
' Shop name ',
' Per capita consumption ',
' minimum consumption ',
' Business circle ',
' Store type ',
' score ',
' Telephone ',
' Business Hours ',
' Address ',
' latitude ',
' longitude ',
' Details page ',
])
csv_writer.writeheader() # Write header
# html_url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/70?uuid=f7c4d3664ab34f13ad7f.1650110501.1.0.0&userid=266252179&limit=32&offset=64&cateId=-1&q=%E4%BC%9A%E6%89%80&token=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA'
1. Send a request , For the just analyzed url Address send request Turn the page to analyze the request url The law of address change
for page in range(0, 321, 32): # from 0 32 64 96 128 160 192 .... 320
time.sleep(1.5) # Delay waiting for 1.5S
url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/70'
# pycharm function Fast batch replacement , ctrl + R Select the target you want to replace , Use regular expressions for batch replacement
data = {
'uuid': 'f7c4d3664ab34f13ad7f.1650110501.1.0.0',
'userid': '266252179',
'limit': '32',
'offset': page,
'cateId': '-1',
'q': ' The clubhouse ',
'token': '9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA',
}
# headers camouflage python Code coat
# User-Agent The user agent Basic identity information of browser .... The simplest means of anti climbing To prevent being identified as a crawler
# Referer Anti theft chain Tell the server that we request url Where does the address jump from
headers = {
'Referer': 'https://chs.meituan.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36'
}
response = requests.get(url=url, params=data, headers=headers)
# print(response) # <Response [403]> Status code Indicates no access rights Anti theft chain 200 The request is successful
2. get data response.text Get text data string data type response.json() Dictionary data type
# print(response.json())
# pprint.pprint(response.json()) # The teacher's version is python 3.8
searchResult = response.json()['data']['searchResult']
for index in searchResult: # Put the data in the list One by one
# pprint.pprint(index)
href = f'https://www.meituan.com/xiuxianyule/{index["id"]}/'
shop_info = get_shop_info(href)
title = index['title'] # Shop name
price = index['avgprice'] # Per capita consumption
lost_price = index['lowestprice'] # minimum consumption
area = index['areaname'] # Business circle
shop_type = index['backCateName'] # Store type
score = index['avgscore'] # score
latitude = index['latitude'] # latitude
longitude = index['longitude'] # longitude ctrl + D Copy quickly
# tab Collective indent
# shift + tab remove indent
dit = {
' Shop name ': title,
' Per capita consumption ': price,
' minimum consumption ': lost_price,
' Business circle ': area,
' Store type ': shop_type,
' score ': score,
' Telephone ': shop_info[0],
' Business Hours ': shop_info[1],
' Address ': shop_info[2],
' latitude ': latitude,
' longitude ': longitude,
' Details page ': href,
}
4. Save the data
csv_writer.writerow(dit)
print(dit)
Okay , My article ends here !
There are more suggestions or questions to comment on or send me a private letter ! Come on together and work hard (ง •_•)ง
If you like, just pay attention to the blogger , Or like the collection and comment on my article !!!
Most of the previous data collection is basically http Of , Also has been on how to collect websocket There are questions about the real-time data of , I don't know where to start , be nonplussed over sth , I saw a collection on Zhihu today websocket The article , Very thorough Finally put this question ...
Meituan technical team 2019-09-12 20:02:11 background 2013 Meituan takeout was founded in , So far, it has been developing rapidly . As the takeout business grows in magnitude , A single text and picture can no longer meet the needs of businesses , Businesses urgently need more means to describe their products ...
One . Purpose Get all the comment information of each meituan food store , And save to the database and local Two . Implementation steps Get the... Of all stores poiId First look at the url, Followed by a string of numbers , And this string of numbers represents the unique characteristics of each store id Number , We call ...
Meituan takeout app Feasibility analysis 1 introduction 1.1 Purpose of writing Young people pursue fashion , quick , Therefore, the takeaway industry has a broad consumer base : The rise of group buying , It also promotes people's consumption desire , People continue to have a takeout platform , To satisfy their desires .O2o The end of the model ...
One . background The cold start time is App An important indicator of performance , As the first course of user experience “ door ”, It directly determines the user's attitude towards App First impression of . Meituan takeout iOS The client from 2013 year 11 Month begins , After dozens of iterations , The product form is constantly improving , Business skills ...
Meituan takeout platform reuse mainly refers to multi terminal code reuse , Just like meituan takeout iOS Promotion of multi terminal reuse . Support and think about , Multiterminal has two meanings : One is multi entry of the same business , Meituan take out business needs to take out in meituan App( It is hereinafter referred to as takeout App) And meituan ...
WMRouter Is a Android Routing framework , Design ideas based on component , Flexible function , It's easy to use . WMRouter Originally used to solve meituan takeout C End App Practical problems in the process of business evolution , After that, it gradually extended to other parts of meituan App ...
Crash Rate is a measure of App One of the important indicators of good or bad , If you ignore it , It's going to get worse , In the end, a large number of users are lost , And bring immeasurable loss to the company . This article talks about meituan takeout Android The client team will App Of Cras ...
summary Lint yes Google Provided Android Static code checking tool , You can scan and find potential problems in your code , Remind developers to fix it early , Improve code quality . except Android There are hundreds of them Lint The rules , You can also develop custom L ...
Use Python take Excel Import data from to MySQL Tools Python 2.7 xlrd MySQLdb install Python For different systems, the installation method is different ,Windows Platform has exe Installation package ,Ubunt ...
What I learned first was js The way to implement inheritance in w3school Learn how to mix prototype chains and object impersonation , At work , When it comes to inheritance , I use this method to achieve . Its implementation is simple , have a lucid brain : Impersonating objects as inheriting properties of the parent class constructor , use ...
Minimal installation centos6.4 when ,xinetd The service is not installed , It's just /etc There are xinetd.d Catalog , No, xinetd.conf This configuration file xinetd is a secure replacemen ...
One . The story First, through CDO.Message To get mail EML related data : Email title . Email content . Email attachment . Sender . The recipient .CC There are only a few , Secondly through MailMessage To organize mail through Python To send mail ! ...
1. Refer to the image list (reference picture list) Generally speaking ,h.264 The images to be encoded are divided into three types :I.P.B, Among them B.P This type of image adopts the inter frame coding method , And interframe coding is ...
Recently read <<Java Concurrent programming practice >>, In Chapter 2, thread safety is reduced to re-entry of thread lock (Reentrancy) When a thread requests a lock that is already held by another thread , The request thread will be blocked . However, the internal lock is reusable ...
( I'll take it from the front ) The first page xiugai.php <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ...
I used to use c Wrote a program , lookup reads Is it included in adaptor, If it is detected, it is filtered out adaptor Of reads, This time, after filtering the data, we found that there were many joint sequences , In order to improve the assembly effect , It can not greatly affect the amount of data , ...
zz:http://xukaizijian.blog.163.com/blog/static/1704331192011611104631875/ wget http://ohse.de/uwe/re ...
The concept of enumeration is similar to that of multi - instance design patterns , For more design patterns, see : Multiple design pattern code models Example : Simple enumeration class adopt emu Keyword defines an enumeration package com.java.demo; enum Color{ RED,BL ...
AppFlyer It is a popular advertising tracking statistical tool recently , Of course, the statistical function of Youmeng can also be realized , and appsflyer It also has targeted delivery , yes app Jump to the corresponding page . The details of the : When you click on an advertisement , Assume no application is installed . Will jump ...