Python crawler (I) getting to know the requests Library
編輯:Python
PYTHON Reptiles ( One )
python Reptiles
Yes pyhton Reptile understanding
Front end crawling requests library
1. obtain response object
2. Network status code
3. Encoding mode
4. Binary stream output and encoded output
5. Data storage
python Reptiles
Yes pyhton Reptile understanding
Web crawler : Crawl the data from the front of the web page and extract what you need to save it
Front end crawling requests library
1. obtain response object
import requests # Import requests
res = requests.get(url) # obtain url Front end data of ,url Must not omit the entire web site http:// perhaps https://
2. Network status code
res.status_code #200 It means success
Is used to indicate the response status of hypertext transfer protocol of web server 3 Digit code .
Status code
meaning
1XX series Specify some actions that the client should take , The representative's request has been accepted , Need to continue processing . because HTTP/1.0 Nothing is defined in the agreement 1xx Status code , So unless under certain experimental conditions , The server forbids sending 1xx Respond to .2XX series The delegate request has been successfully received by the server 、 understand 、 And accept . The most common in this series are 200、201 Status code .3XX series Represents that the client needs to take further action to complete the request , These status codes are used to redirect , Subsequent request address ( Redirect to ) In this response Location The domain indicates . The most common in this series are 301、302 Status code .4XX series Indicates a request error . Represents that the client may have an error , Hinders server processing . There are common :401、404 Status code .5XX series Represents that the server has an error or abnormal state in the process of processing the request , It is also possible that the server realizes that it cannot complete the processing of the request with the current hardware and software resources . There are common 500、503 Status code .
3. Encoding mode
res.encoding # Encoding mode
res.apparent_encoding # The matching encoding method , Generally very accurate
The general Chinese code of the coding method is UTF-8 UTF-16 GBK GB2312 GB18030( Case insensitive ) What is commonly used is UTF-8 GB2312 python and linux The default is UTF-8 windows Default GB2312
The data format of encoded output is str Generally used to save some text , And binary stream output is used to save video 、 Pictures, etc
5. Data storage
f = open('myFirst.txt', 'w') # Build file object FileName file name ,Mode Is the mode
f.write(res.text) #DocumentContent The contents of the document
f.close() # Close file
That's the first step for a reptile , Just crawl the front unrerendered pages , Later, we need to extract what we want here , Look at my next chapter on reptiles
Mode
describe
r Open the file read-only . The pointer to the file will be placed at the beginning of the file . This is the default mode .rb Open a file in binary format for read-only use . The file pointer will be placed at the beginning of the file . This is the default mode .r+ Open a file for reading and writing . The file pointer will be placed at the beginning of the file .rb+ Open a file in binary format for reading and writing . The file pointer will be placed at the beginning of the file .w Open a file only for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .wb Opening a file in binary format is only used for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .w+ Open a file for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .wb+ Open a file in binary format for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .a Open a file for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to .ab Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to a+ Open a file for reading and writing . If the file already exists , The file pointer will be placed at the end of the file . Append mode when the file opens . If the file does not exist , Create a new file for reading and writing .ab+ Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . If the file does not exist , Create a new file for reading and writing .