程序師世界是廣大編程愛好者互助、分享、學習的平台，程序師世界有你更精彩！


設為首頁	加入收藏

首頁
編程語言: C語言|JAVA編程
 Python編程
網頁編程: ASP編程|PHP編程
 JSP編程
數據庫知識: MYSQL數據庫|SqlServer數據庫
 Oracle數據庫|DB2數據庫

您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python crawling data - primary

編輯：Python

爬蟲的應用

一,准備環境

1,准備pycharm開發工具
2,安裝對應的依賴 Scrapy

二,使用scrapy startproject 創建項目

After the project is created, the following figure is shown

三,在項目的spidersCreate a crawler in the directory
1,先切換目錄

2,創建爬蟲

After the crawler is successfully created, the effect is as follows

三,配置文件
1,配置settings文件
1）把 ROBOTSTXT_OBEY=True改成ROBOTSTXT_OBEY=False
2）Uncomment the pipeline configuration

3）修改默認請求頭

2,在items.pyAdd the content that needs to be crawled to the file

3,編寫爬蟲bookTest.py代碼

import scrapy
from ..items import BookItem
class BooktestSpider(scrapy.Spider):
name = 'bookTest'
allowed_domains = ['book.douban.com']
start_urls = []
base_url = []
# 爬取前10頁
i = 0
j = 10
while i < j:
base_url += ['https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4?start='+str(i*20)+'&type=T']
i += 1
start_urls = base_url
def parse(self, response):
lies = response.xpath('//ul[@class="subject-list"]/li')
for li in lies:
bookname = li.xpath(".//div[@class='info']//a/@title").extract_first()
author = li.xpath(".//div[@class='pub']/text()").extract_first()
jj = li.xpath(".//p/text()").extract_first()
item = BookItem()
item['bookname'] = bookname
item['author'] = author
item['jj'] = jj
yield item

4,Write the pipeline code to save the data

四,最後執行爬蟲

Python

PHP 是世界上最好的語言？黑客偏愛用 Python

Python 變得越來越流行，在之前 9 月份的 TIOBE

1. Pythons print( ) output function

1. Python 的 print( )輸出函數文章目錄1.

The king of all lambda function in Python

Python Provides a lot of libra

Pythongroupby function to select the maximum value after grouping

Now we need to course grouping

Solution to prompt warning: ignoring invalid distribution IP when using Python PIP command

Problem description ： In the u

app小程序手機端Python爬蟲實戰07UiSelector通過resourceId、層級定位方式

作者：虛壞叔叔博客：https://xuhss.com早餐

相關文章

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

51job crawler + data visualization Python

Python data structure problems

Introduction to Python data structure and algorithm

Django project - order module (next) and data statistics_ 11 [more readable version]

Python data analysis - pandas data structure (dataframe)

Python data analysis science library pandas (statistical analysis and decision)

Python -- data visualization using Matplotlib Library

I read a value from a file. How can I make this data value locate according to the value I read (Language Python)

Python implements the cell filling color of data required in Excel

閱讀排行榜

Python confession code collection: 5 confession codes, you cant find the object, you come to me, this is too beautiful python練手_opencv面部相似度標記 21 days Python advanced learning challenge clock -- -- -- -- -- - 2 days (basics) What is the way to deal with the merits in python? Solving linear equations and matrix eigenvalues and eigenvectors with Python QR decomposition Python獲取遠程文件內容 What can such a hot Python do and how about its salary? 淺談一下學Java和python哪個好（個人觀點）【Django學習筆記 - 15】：admin站點編輯（關聯對象在列表頁中添加，編輯頁調整、圖片設置） Typical cases of Python lists django填充pyechart的圖到前端模版中（不使用Ajax，而是直接貼一個div）

熱門圖文

C語言求閏年代碼，C語言閏年代碼 C++類成員結構函數和析構函數次序示例具體講授 PHP ajax 分頁類代碼 delphi編程控制INI文件 C#完成將千分位字符串轉換成數字的辦法 java基礎之 Advanced Class Design Simple use of automated Python C#中獲得漢字的首拼音(加強版)

欄目導航

編程綜合問答

更多關於編程

編程問題解答

Copyright © 程式師世界 All Rights Reserved