Hello everyone , I am ambitious
This time I introduce a super simple small page for crawling dynamic web pages demo
.
When it comes to dynamic web pages , How much do you know about it ?
Simply speaking , To get the web page data of a static web page, just send the web page to the server url
Address line , The data of dynamic web page is stored in the back-end database . So to get dynamic web page data , We need to send the request file to the server url
Address , Not the page url
Address .
ok, Let's get to the point .
This blog post starts with the map of Gaud :https://www.amap.com/
After opening , We found a bunch of div label
, But there's no data we need , At this time, it can be determined as a dynamic web page , This is the time , We need to find the interface
Click on the web tab , We can see that the web page sends a lot of requests to the server , There's a lot of data , It takes too much time to find it
We click XHR
classification , Can reduce a lot of unnecessary files , Save a lot of time .
XHR The type is passed XMLHttpRequest Method to send the request , It can exchange data with the server in the background , This means that you can... Without loading the entire web page , Update a part of the web page . in other words , The data requested from the database and the response is XHR Type of
And then we can go in XHR
Under the type, start looking for , The following data are found
By looking at Headers
get URL
After the open , We found that it was the weather condition of the last two days .
After opening it, we can see the above situation , This is a json
File format . then , Its data information is kept in the form of a dictionary , And the data is stored in “data”
In this key value .
ok, eureka json
data , Let's compare and see if it's what we're looking for
by force of contrast , The data exactly corresponds to , That means we have the data .
'''
ok, We've got the website , The following is the specific code implementation . How to do that ,
We know json Data can be used response.json()
Turn Dictionary , And then operate the dictionary .
After knowing the location of the data , We started to write code .
First grab the web page , By adding headers
To disguise as a browser to access the database address , To prevent interception after being identified .
url_city = "https://www.amap.com/service/cityList?version=202092419"
After we get the data we want , We can find out by searching cityByLetter
The number and name in it are what we need , Then we can dish it .
if "data" in content:
Got the number and the name , The following must be the weather query !
Let's look at the interface first
Pass the above figure , You can determine the maximum temperature , Minimum temperature, etc . So we can crawl data in this way .
url_weather = "https://www.amap.com/service/weather?adcode={}"
ok, Our vision has come true .
# encoding: utf-8