Sometimes our code is always confused ?
Why do others collect xx The website can succeed , And I always don't return the data
When this happens, it is often because we don't give enough disguise , Identified ~
It's like people , You must wear clothes when you go out, don't you , If you don't wear !
Walk outside , It must be the most conspicuous one , Who will you catch if you don't
Another is that I have run successfully before , Why can't I run again now ~
And throw a word to me “ The system detects that you frequently visit , Please come back later ”
All right. ! Now let's seriously introduce how to deal with this situation ~
Be able to disguise , Think about it , How do people access websites
This time we're talking about camouflage Header , When you want to crawl the data of a website
You have to think about , If someone else crawls your data , What can you do
Don't you want to , Let others casually and madly request your server
Will you also , Take certain measures
When you want to pass python When I came to climb …
Here I'll write a simple example that can be requested
from flask import Flask
app = Flask(__name__)
@app.route('/getInfo')
def hello_world():
return " Pretend there's a lot of data here "
if __name__ == "__main__":
app.run(debug=True)
ok , Suppose you analyze my address now ,
That is to say, you can go through /getInfo You can get the data
You feel great , Began to ask
url = 'http://127.0.0.1:5000/getInfo'
response = requests.get(url)
print(response.text)
you 're right , You did get the data at this time
however ! I think something's wrong , Want to see the requested header Information
@app.route('/getInfo')
def hello_world():
print(request.headers)
return " Pretend there's a lot of data here "
if __name__ == "__main__":
app.run(debug=True)
As a result, I saw headers This is the message
Host: 127.0.0.1:5000
User-Agent: python-requests/2.21.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
User-Agent: python-requests/2.21.0
Actually use python To request , Who do you say I won't seal you ?
So I make a judgment at this time , You can't get the data
@app.route('/getInfo')
def hello_world():
if(str(request.headers.get('User-Agent')).startswith('python')):
return " The system detects that you frequently visit , Please come back later "
else:
return " Pretend there's a lot of data here "
Welcome to white whoring Q Group :660193417 ###
if __name__ == "__main__":
app.run(debug=True)
Your request at this time
if __name__ == '__main__':
url = 'http://127.0.0.1:5000/getInfo'
response = requests.get(url)
print(response.text)
The result is
“ The system detects that you frequently visit , Please come back later ”
You've been exposed to me , Want to do it again , So what to do ?
Disguise yourself ,python No access
The browser can access , So you can modify your request header
First visit... In the browser , Then, when capturing the data, we get Header data
You can also use it Chrome Control panel for Header
With Header After the message , You can use requests Easy access to modules
Okay , Now you learn to pretend to be a browser
Welcome to white whoring Q Group :660193417 ###
if __name__ == '__main__':
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
}
url = 'http://127.0.0.1:5000/getInfo'
response = requests.get(url,headers=headers)
print(response.text)
Get it again and you'll find , The return is
Pretend there's a lot of data here
ok, You got the data again
All right. , This is the end of the article ~ If it's helpful to you, just like it and collect it !