The authors introduce :Python Quality creators in the field 、 Huawei cloud sharing expert 、 Alibaba cloud expert Blogger 、2021 year CSDN Blog star Top6
- This article has been included in Python Full stack series column :《100 Sky master Python From entry to employment 》
- This column is dedicated to Python A complete set of teaching prepared by zero foundation Xiaobai , from 0 To 100 Continuous advanced and in-depth learning , All knowledge points are linked
- Subscribe to the column and read later Python From entry to employment 100 An article ; You can also chat with 200 people in private Python Full stack communication group ( Teaching by hand , Problem solving ); Join the group to receive 80GPython Full stack tutorial video + 300 This computer book : Basics 、Web、 Reptiles 、 Data analysis 、 visualization 、 machine learning 、 Deep learning 、 Artificial intelligence 、 Algorithm 、 Interview questions, etc .
- Join me to learn and make progress , One can walk very fast , A group of people can go further !
This article mainly studies
requests
This http modular , This module is mainly used to send request and get response , This module has many alternative modules , for instance urlib modular , But the most used in work is requests modular ,requests The code is simple Understandability , Compared with bloated urlib modular , Use requests Less crawler code will be written , And realize some - The function will be simple . Therefore, it is recommended that you master the use of this module
1. window The computer clicks win key + R
, Input :cmd
2. install requests
, Enter the corresponding pip command :pip install requests
, I have already installed the existing version, and the installation is successful
response = requests.get(url)
Send the response object obtained by the request ( The most commonly used )response = requests.post(url)
Send the response object obtained by the request response.url
Responsive url; Sometimes the response is ur1 And requested urI Don't agree with each other response.status_ code
Response status code , Such as :200,404response.request.headers
Respond to the corresponding request header response. headers
Response head response.request.cookies
Respond to the corresponding request cookie; return cookieJar type response.cookies
Responsive cookie ( After set- cookie action ; return cookieJar type )response.json()
Automatically put json The response content of string type is converted to python object (dict or list)response.text
Returns the content of the response ,str type response.content
Returns the content of the response , bytes type Simple code implementation : adopt requests Send a request to Baidu home page , Get the source code of the page
import requests
# Target website
url = "http://www.baidu.com/"
# Send request to get response
response = requests.get(url)
# View the type of response object
print(type(response))
# Check the response status code
print(response.status_code)
# View the type of response content
print(type(response.text))
# see cookies
print(response.cookies)
# View the contents of the response
print(response.text)
Output results :
<class 'requests.models.Response'>
200
<class 'str'>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸€ä¸‹ï¼Œä½ 就知é“</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç¾åº¦ä¸€ä¸‹ class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ–°é—»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½•</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">ç»å½•</a>');</script> <a href=https://www.baidu.com/more/ name=tj_briicon class=bri >更多产å“</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å
³äºŽç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用ç¾åº¦å‰å¿
读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>æ„è§å馈</a> 京ICPè¯030173å· <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>\xe6\x96\xb0\xe9\x97\xbb</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>\xe5\x9c\xb0\xe5\x9b\xbe</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>\xe8\xa7\x86\xe9\xa2\x91</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>\xe8\xb4\xb4\xe5\x90\xa7</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>\xe7\x99\xbb\xe5\xbd\x95</a> </noscript> <script>document.write(\'<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ \'" name="tj_login" class="lb">\xe7\x99\xbb\xe5\xbd\x95</a>\');</script> <a href=https://www.baidu.com/more/ name=tj_briicon class=bri >\xe6\x9b\xb4\xe5\xa4\x9a\xe4\xba\xa7\xe5\x93\x81</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>\xe5\x85\xb3\xe4\xba\x8e\xe7\x99\xbe\xe5\xba\xa6</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>\xe4\xbd\xbf\xe7\x94\xa8\xe7\x99\xbe\xe5\xba\xa6\xe5\x89\x8d\xe5\xbf\x85\xe8\xaf\xbb</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88</a> \xe4\xba\xacICP\xe8\xaf\x81030173\xe5\x8f\xb7 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'
response.text
:
response.content
:
Through to response.content Conduct decode, To solve Chinese garbled code :
response.content.decode()
: Default utf-8response.content.decode('GBK')
Common coded character sets
utf-8
gbk
gb2312
asci
( pronunciation : Aske code )iso-8859-1
Code demonstration :
import requests
# Target website
url = "http://www.baidu.com/"
# Send request to get response
response = requests.get(url)
# Set the encoding format manually
response.encoding = 'utf8'
# Print source code str Type data
print(response.text)
# response.content It's stored bytes Type of response data , Conduct decode operation
print(response.content.decode('utf-8'))
Running results :
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title> use Baidu Search , You will know </title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value= use Baidu Search class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav> Journalism </a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav> Map </a> <a href=http://v.baidu.com name=tj_trvideo class=mnav> video </a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav> tieba </a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb> Sign in </a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb"> Sign in </a>');</script> <a href=https://www.baidu.com/more/ name=tj_briicon class=bri > More products </a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com> About Baidu </a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/> Read Before Using Baidu </a> <a href=http://jianyi.baidu.com/ class=cp-feedback> Feedback </a> Beijing ICP Prove 030173 Number <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title> use Baidu Search , You will know </title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value= use Baidu Search class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav> Journalism </a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav> Map </a> <a href=http://v.baidu.com name=tj_trvideo class=mnav> video </a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav> tieba </a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb> Sign in </a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb"> Sign in </a>');</script> <a href=https://www.baidu.com/more/ name=tj_briicon class=bri > More products </a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com> About Baidu </a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/> Read Before Using Baidu </a> <a href=http://jianyi.baidu.com/ class=cp-feedback> Feedback </a> Beijing ICP Prove 030173 Number <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
1) View browser request header
2) Code instructions
requests.get(ur1, headers=headers)
3) Code implementation :
import requests
# Target website
url = "http://www.baidu.com/"
# Build request header Dictionary , The most important thing is User-Agent
# If you need other request headers , It's just headers Add... To the dictionary
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
# Send request to get response
response = requests.get(url,headers=headers)
print(response.text)
Running results : The entire web source code :
How to delete redundant parameters in a web page address ?
The first method : The URL contains parameters
import requests
# Target website
url = "https://www.baidu.com/s?wd=python"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
# Send request to get response
response = requests.get(url,headers=headers)
print(response.text)
The second way : adopt params
Construct parameter Dictionary
import requests
# Target website
url = "https://www.baidu.com/s?"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
# The request parameter is a dictionary
kw = {
'wd': 'python'}
# Set the parameter dictionary when sending the request , Get a response
response = requests.get(url, headers=headers, params=kw)
print(response.text)
Websites often take advantage of... In the request header Cookie Field to maintain the user access state , So we can do that headers Add... To the parameter Cookie, Simulate the request of ordinary users .Cookie It has timeliness and needs to be replaced after a period of time
1. Open Google browser 》 Right click to check 》 Click on the top left corner to refresh the page
2. Click on Network 》 Find the corresponding web address 》 Turn down and find it Cookie
And copy
3. stay headers Add... To the dictionary cookie Parameters
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'Cookie': 'BAIDUID=157D064FDE25DE5DD0E68AF62CBC3627:FG=1; BAIDUID_BFESS=157D064FDE25DE5DD0E68AF62CBC3627:FG=1; BIDUPSID=157D064FDE25DE5DD0E68AF62CBC3627; PSTM=1655611179; BD_UPN=12314753; ZFY=Cs:BflL5Del98YBOjx2EyRPzQE3QCyolFKzgVTguBEHI:C; BD_HOME=1; H_PS_PSSID=36548_36626_36673_36454_31254_36452_36690_36165_36693_36696_36569_36657_26350_36469; BA_HECTOR=85850gag05ak0l040h1hbg5st14; delPer=0; BD_CK_SAM=1; PSINO=7; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_645EC=0e08fXgvc5rDJVK1jRjlqmZ7pLp5r%2Fmn9jlENTs3CQ4%2FbhzUL09Y%2F%2FYtCGA; baikeVisitId=e10d7983-547d-4f34-a8d8-ec98dbcba8e4; COOKIE_SESSION=115_0_2_2_1_2_1_0_2_1_0_0_0_0_0_0_1655611189_0_1656233437%7C3%230_0_1656233437%7C1'
}
At ordinary times . In the process of surfing , We often encounter network fluctuations , This is the time , A request that has been waiting for a long time may still have no result . In reptiles , A request has been fruitless for a long time , It will make the efficiency of the whole project very low , At this time, we need to enforce the request , Let him have to return the result within a specific time , Otherwise, it will be wrong .
1. Timeout parameters timeout How to use
response = requests.get(ur1, timeout=3)
2. timeout=3 Express : After sending the request ,3 Response returned in seconds , Otherwise throw an exception
3. Combat code :
import requests
# Target website
url = "https://www.baidu.com/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
try:
response = requests.get(url, headers=headers, timeout=10) # The timeout is set to 10 second
except:
for i in range(4): # Loop to request the website
response = requests.get(url, headers=headers, timeout=20)
if response.status_code == 200:
break
html_str = response.text
In order to make the server think that the same client is not requesting ; In order to prevent frequent requests to a domain name from being blocked ip, So we need to use agents ip
grammar :
response = requests.get(url, proxies=proxies)
proxies In the form of : Dictionaries
proxies = {
"http": "http://12.34.5679:9527",
"https": "https://12.34.5679:9527",
}
Be careful : If proxies The dictionary contains multiple key value pairs , The request will be sent in accordance with ur Address protocol to choose to use the corresponding proxy ip
requests Module to send post Request other parameters of the function and send get The requested parameter is exactly one Cause
Grammar format :
response = requests.post(url, data) # data Parameter receives a dictionary
How to find data Forms ?
Take Baidu translation as an example : Find the corresponding request , Click on Payload, an Form data Forms
Construct in code data Dictionaries
import requests
url = "https://fanyi.baidu.com/"
data = {
'query': ' Love '
}
response = requests.post(url)
print(response.text)
Return to the full web page