您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python3 urllib. request. Urlopen() API uses

編輯：Python

List of articles

urllib Basic use of Library ---request modular
- Request web page -urllib.request.urlopen()
Reference material

urllib Basic use of Library —request modular

The code of this article basically comes from Python3 Web crawler development practice .

urllib The library contains the following four basic modules :

request： The most basic HTTP Request module , Simulate the sending of the request .
error： Exception handling module .
parse： Tool module . Yes URL Provide split 、 analysis 、 Merge and other functions .
robotparser： Mainly used to identify the website robots.txt file , The crawler permissions are set in this file , That is, which crawlers the server allows can crawl which web pages .

It's recorded here request Module some basic API Use of functions .

Request web page -urllib.request.urlopen()

Use it directly urllib.request.urlopen() Send web page request

API standard :
urllib.request.urlopen(url, data=None, [timeout,]*, cafile=None, capath=None, cadefault=False, context=None).
Parameter interpretation :
url: Request URL
data： The request is sent to the designated url The data of , When this parameter is given , The request mode changes to POST, If not given, it is GET. You need to use... When adding this parameter bytes Method converts the parameter to the content of byte stream encoding format , The following is an example .
timeout: Set the timeout . If no response is received within the set time , Throw an exception .
cafile, capath Respectively CA Certificate and its path ,cadefault, context No introduction .
Examples of use :
import urllib.request
response = urllib.request.urlopen('https://www.baidu.com')
print(type(response)) # Print the data type of the obtained response object 
print(response.read().decode('utf-8')) # Print the obtained web page HTML Source code 
Use urlopen After the function , The objects returned by the server are stored in response in , Print response The data type of the object , by http.client.HTTPResponse.
If you want to add data to the request , You can use data Parameters .
Examples of use :
import urllib.request
import urllib.parse
dic = {

'name': 'Tom'
}
data = bytes(urllib.parse.urlencode(dic), encoding='utf-8')
response = urllib.request.urlopen('https://www.httpbin.org/post', data=data)
adopt data The dictionary data passed by the parameter , You need to use it first urllib.parse.urlencode() Convert to string , And then through bytes() Method is transcoded to byte type .
timeout: Specify the timeout period . In seconds .
response = urllib.request.urlopen('https://www.baidu.com', timeout=0.01)
stay 0.01 If no response is received from the server within seconds , And throw an exception .

It can be seen that ,urlopen Too few parameters for , It also means that , There are too few request headers that we can set .

Construct more complete requests ： Use urllib.request.Request object , This object is the encapsulation of the request header , By using Request object , We can separate the request headers , In order to set , Not like the previous method , Just passing URL.
Request Construction method of :
class urllib.request.Request(url, data=None, headers={
},
origin_req_host=None, unverifiable=False, method=None)
Examples of use :
from urllib import request, parse
url = 'https://www.httpbin.org/post'
headers = {

'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Host': 'www.httpbin.org'
}
dict = {
'name': 'Tom'}
data = bytes(urllib.parse.urlencode(dict), encoding='utf-8')
request = urllib.request.Request(url=url, data=data, headers=headers, method='POST')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))
A is constructed in advance Request object , It is then passed as a parameter to the urlopen() Method .