您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python crawler tutorial (pure self-study experience, nanny level tutorial)

編輯：Python

If you are little white , This set of information can help you become a big bull , If you have rich development experience , This set of information can help you break through the bottleneck
2022web Full set of video tutorial front-end architecture H5 vue node Applet video + Information + Code + Interview questions .

preface

This is a series of articles , The author starts with books , Online course , Including blog and other ways to self-study crawler's notes and experience . One side is as a basic tutorial , For reader's reference , On the other hand, it's my own integration of notes , A record of the process . The article will be updated continuously It's today 2021.05.10 Update every three days , Readers are welcome to pay attention to me or the article .

List of articles

preface
- 1.1python Reptile camouflage [ Free camouflage ip Camouflage request header ]
- One 、 Introduction to web crawler
- Two 、 My first crawler code
- 3、 ... and 、“ Where to hit ”
- Four 、 Web information storage and BeautifulSoup And find usage
- 5、 ... and , web capture
- Add Record of actual combat experience
- 6、 ... and 、 Dynamic web page principle
- 7、 ... and 、Selenium Simulation browser

1.1python Reptile camouflage [ Free camouflage ip Camouflage request header ]

python Reptile camouflage [ Free camouflage ip Camouflage request header ]

 One 、 What did the reptiles take away
Two 、 Forged request header
1. download my-fake-useragent library
3、 ... and 、 Using agents ip
1、Redis stay win10 The installation of the
2、 Use of open source projects

One 、 Introduction to web crawler

Introduction to web crawler

1. What reptiles are there ？
2. Is the web crawler legal ？
3. Web crawler constraints .（Robots agreement ）
4.python The process of web crawler .

Two 、 My first crawler code

My first crawler code

 Preface One 、 How to install the required third-party packages ？（ Configuration environment ）
How to install third-party libraries
Two 、 How to do “ Where to hit ” Review the correct opening of the element （ Usage method ）
3、 ... and 、 Write a simple crawler Crawler acquisition bilibil The front-end code of the home page of the website thank

3、 ... and 、“ Where to hit ”

Where to hit

 Preface
One 、 Parse web pages .
1. Use the review element to locate the code
Code
Two 、 Parse the code line by line
1.BeautifulSoup Installation
2.BeautifulSoup Parser
2.find Usage of
thank

Four 、 Web information storage and BeautifulSoup And find usage

 Preface
One 、BeautifulSoup And find usage
find
find_all
Specific use examples
Two 、 Web page information storage
1. Basic knowledge of
2. Write data
thank

5、 ... and , web capture

5、 ... and , Static web page capture

 Preface
One 、Requests Library usage
Two 、 customized Requests
1. Pass on URL Parameters
2. Custom request header
3. Set timeout
2. send out post request
thank

Add ： Record of recent problems （ resolved ）

Add Record of actual combat experience

Record of actual combat experience

 Static pages but no results
resolvent ： Crawl and extract the full-text front-end code
thank

6、 ... and 、 Dynamic web page principle

 Preface
One 、 What is a dynamic web page
Two 、 The principle of dynamic web pages
1.AJAX
2. Dynamic web page instance
3. Capture dynamic web information
thank

7、 ... and 、Selenium Simulation browser

 Preface
One 、Selenium Installation
Two 、Selenium details
3、 ... and 、 How to download chromedriver
1. Find your browser version .
2 Find the specified website to download
3 Environment configuration
4 verification
Four 、 selenium Using examples
thank