程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python web crawler Prelude

編輯:Python

Reptile Prelude

A practical example of a reptile :

  1. Search engine ( Baidu 、 Google 、360 Search, etc ).
  2. Bole Online .
  3. Huihui shopping assistant .
  4. Data analysis and research ( Data iceberg column ).
  5. Ticket grabbing software, etc .

What is a web crawler :

  1. Easy to understand : A crawler is a program that simulates human behavior in requesting a website . You can automatically request web pages 、 And capture the data , Then use certain rules to extract valuable data .
  2. Professional introduction :​ ​ Baidu Encyclopedia ​​.

General purpose reptiles and focus reptiles :

  1. Universal crawler : General crawler is a search engine crawling system ( Baidu 、 Google 、 Sogou etc. ) An important part of . It mainly downloads the web pages on the Internet to the local , Form a mirror backup of Internet content .
  2. Focus on reptiles : Is a web crawler program for specific needs , The difference between him and the universal reptile is : The focused crawler will filter and process the content when implementing web page crawling , Try to ensure that only the web page information related to the requirements is captured .

Why Python Write a crawler program :

  1. PHP:PHP Is the best language in the world , But he wasn't born to do this , And for multithreading 、 Asynchronous support is not very good , Weak concurrent processing ability . Reptiles are instrumental programs , High speed and efficiency requirements .
  2. Java: The ecosystem is perfect , yes Python Reptile's biggest competitor . however Java The language itself is heavy , A lot of code . Reconstruction costs are high , Any modification will result in a lot of code changes . Crawlers often have to modify the collection code .
  3. C/C++: Operating efficiency is invincible . But the cost of learning and development is high . It may take more than half a day to write a small crawler program .
  4. Python: The grammar is beautiful 、 The code is concise 、 High development efficiency 、 There are many supported modules . dependent HTTP Request module and HTML Parsing modules are very rich . also Scrapy and Scrapy-redis The framework makes it extremely easy for us to develop crawlers .

The preparation of the instruments :

  1. Python3.6 development environment .
  2. Pycharm 2017 professional edition .
  3. A virtual environment .`virtualenv/virtualenvwrapper`.



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved