您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python3 urllib.request.urlopen()API使用

編輯：Python

文章目錄

urllib庫的基本使用---request模塊
- 請求網頁-urllib.request.urlopen()
參考資料

urllib庫的基本使用—request模塊

本文代碼基本來源於Python3 網絡爬蟲開發實戰。

urllib庫包含如下四個基本模塊:

request：最基本的HTTP請求模塊，模擬請求的發送。
error：異常處理模塊。
parse：工具模塊。對URL提供拆分、解析、合並等功能。
robotparser：主要用來識別網站的robots.txt文件，該文件中設定了爬蟲的權限，即服務器允許哪些爬蟲可以爬取哪些網頁。

這裡記錄了request模塊一些基本API函數的使用。

請求網頁-urllib.request.urlopen()

直接使用urllib.request.urlopen()發送網頁請求

API規范:
urllib.request.urlopen(url, data=None, [timeout,]*, cafile=None, capath=None, cadefault=False, context=None)。
參數解釋:
url:請求網址
data：請求時傳送給指定url的數據，當給出該參數時，請求方式變為POST，未給出時為GET。在添加該參數時需要使用bytes方法將參數轉化為字節流編碼格式的內容，後面舉例介紹。
timeout:設定超時時間。如果在設定時間內未獲取到響應，則拋出異常。
cafile, capath分別為CA證書及其路徑，cadefault, context不做介紹。
使用示例:
import urllib.request
response = urllib.request.urlopen('https://www.baidu.com')
print(type(response)) #打印獲取到的響應對象的數據類型
print(response.read().decode('utf-8')) #打印獲取到的網頁HTML源碼
使用urlopen函數後，服務器返回的對象存儲在response中，打印response對象的數據類型，為http.client.HTTPResponse。
如果要在請求中添加數據，則可以使用data參數。
使用示例:
import urllib.request
import urllib.parse
dic = {

'name': 'Tom'
}
data = bytes(urllib.parse.urlencode(dic), encoding='utf-8')
response = urllib.request.urlopen('https://www.httpbin.org/post', data=data)
通過data參數傳遞的字典數據，需要先使用urllib.parse.urlencode()轉換為字符串，然後通過bytes()方法轉碼為字節類型。
timeout:指定超時時間。以秒為單位。
response = urllib.request.urlopen('https://www.baidu.com', timeout=0.01)
在0.01秒內如果未接收到服務器的響應，便拋出異常。

可以看出，urlopen的參數太少，這也意味著，我們能夠設置的請求頭信息太少。

構造更為完整的請求：使用urllib.request.Request對象，該對象是對請求頭的封裝，通過使用Request對象，我們能夠將請求頭單獨分離，以便設置，而不是像上一種方法一樣，僅僅只是傳遞URL。
Request的構造方法:
class urllib.request.Request(url, data=None, headers={
},
origin_req_host=None, unverifiable=False, method=None)
使用示例:
from urllib import request, parse
url = 'https://www.httpbin.org/post'
headers = {

'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Host': 'www.httpbin.org'
}
dict = {
'name': 'Tom'}
data = bytes(urllib.parse.urlencode(dict), encoding='utf-8')
request = urllib.request.Request(url=url, data=data, headers=headers, method='POST')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))
事先構造了一個Request對象，然後將其作為參數傳遞給urlopen()方法。