代碼如下:
import urllib
import urllib2
import re
url ='http://www.yingjiesheng.com/guangzhou-moreptjob-2.html'
req = urllib2.Request(url)
try:
html = urllib2.urlopen(req).read()
print html
except urllib2.HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except urllib2.URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
print 'No exception was raised.'
代碼結果如下:
求:在爬取網頁源代碼的時候返回空的原因及解決方案(或解決方向)~求大神指點迷津啊!
(PS:在處理這個問題的時候,我曾在IDLE上直接敲這段代碼運行,有時候可以返回源代碼有時候不可以,另外,有時候我把程序運行了幾十遍之後,就能返回源代碼,這時候我把url的數字2改為3時(即相當下一頁),又不可以了,好詭異~~)
代碼:
#!/usr/bin/env python3
#-*- coding=utf-8 -*-
import urllib3
if __name__ == '__main__':
http=urllib3.PoolManager()
r=http.request('GET','http://www.yingjiesheng.com/guangzhou-moreptjob-2.html')
print(r.data.decode("gbk"))
可以正常抓取。需要安裝urllib3,py版本3.43