In the previous cases, only one was captured Url Corresponding page , But in practice , You usually need to grab multiple Url, In reptiles start_urls Add multiple... To the variable Url, The crawler will crawl when running start_urls All in the variable Url. The following code is in start_urls Added... To the variable 2 individual Url, function MultiUrlSpider After the reptile , Will grab these two Url Corresponding page .
class MultiUrlSpider(scrapy.Spider):
name = 'MultiUrlSpider'
start_urls = [
'https://www.jd.com',
'https://www.taobao.com'
]
... ...
The following example uses a text file (urls.txt) Provide multiple Url, And read in the crawler class urls.txt Contents of the file , Then read multiple Url Deposit in start_urls variable . Finally, I will grab urls.txt All of the Url Corresponding page , And output the number of blog posts on the page ( This example provides Url yes geekori.com Blog list page , If readers use other Url, You need to modify the logic code of the analysis page ).
import scrapy
class MultiUrlSpider(scrapy.Spider):
name = 'Mult