您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Simple and fast Python crawler tool: smartsharper

編輯：Python

Hello everyone .

Today I will introduce a simple 、 Automatic and quick Python Reptile tools SmartScraper.SmartScraper Make it easy to grab page data , No longer need to learn things like pyquery、beautifulsoup Equal positioning package , We just need to provide url And data to ta Just learn the rules of web page positioning .

One 、 install

pip install smartscraper

Two 、 Quick start

2.1 Get similar results

for example We want to get from Douban studies - A novel Page access 20 The title and publication information of this book

P1 https://book.douban.com/tag/ A novel ?start=0&type=T
P2 https://book.douban.com/tag/ A novel ?start=20&type=T

We use P1 Link training Title 、 Publish information These two fields

from smartscraper import SmartScraper
#  Links to web pages to be trained
url = 'https://book.douban.com/tag/ A novel ?start=0&type=T'
# Definition   Desired field
wanted_dict = {"title":[" Alive "],
               "pub": [" Yuhua  /  Writers press  / 2012-8-1 / 20.00 element "]
              }
#  Training / stay url Search the corresponding page wanted_dict law
scraper = SmartScraper()
results = scraper.build(url, wanted_dict=wanted_dict)
print(results)

Run code , Collected results as follows

{'title': [' Alive ', 
           ' Fang Siqi's first love paradise ', 
           ' White night line ', 
           ' Solaris ', 
           ' despise ',
           ...], 
 'pub': [' Yuhua  /  Writers press  / 2012-8-1 / 20.00 element ', 
         ' Lin Yihan  /  Beijing joint publishing company  / 2018-2 / 45.00 element ', 
         '[ Japan ]  Guiwu Dongye  /  Liuzijun  /  Nanhai publishing company  / 2013-1-1 / CNY 39.50', 
         '[ wave ]  Stanislaw · Lyme  /  Jingzhenzhong  /  Yilin Translation Publishing House  / 2021-8 / 49.00 element ', 
         '[ It means ]  Alberto · Moravia  /  Shensepmei 、 Liuxirong  /  Jiangsu Phoenix literature and art press  / 2021-7 / 62.00',
          ...]
}

Use the one you just trained scraper Try from P2 link Get the title and Publication Information

scraper.get_result_similar('https://book.douban.com/tag/ A novel ?start=20&type=T')

2.2 Save the model

Trained smartscraper Models can be saved , Subsequent direct calls

scraper.save('douban_Book.pkl')

Model import code

scraper.load('douban_Book.pkl')

上一篇文章： Python_ Fundamentals of computer
下一篇文章： Python series - filter walkthrough

Python

Python based functions global and nonlocal keywords

About bloggers ： Former Intern

Python Tkinter - Chapter 10 text control method

10.3 Method Method describe

版本升級 | OpenSCA v1.0.7版本正式發布，新增Python語言支持

2022年6月29日，OpenSCA新版本v1.0.7正式發

python切片操作方法的實例總結

目錄前言一、切片的2種索引方向二、切片的操作方式三、一些基本

基於Python學生課程管理系統設計與實現

課程管理系統主要分為三個客戶端登陸，分別是學生、任課教師和系

python—PIL圖像操作(二)濾鏡、圖片合成、圖像效果

目錄ImageFilter濾鏡ImageChops圖片合成I

The problem of sorted and reversed in Python

The use of str() and repr() methods in Python

Pandas uses apply and lambda to process data

Django admin uses import_ Export display and import / export data

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

Python and fractal 0019 - [tutorial] stack of circles

Python car and walking problem solution

Luogu pythonp1228 carpet filling problem divide and conquer

Python script: change all files in the current folder in a certain order, and save the original file name and the new file name to TXT (separated by spaces)

[Django] development: static file, application and model layer

熱門圖文

C# 程序員參考--平台調用教學文章 LeetCode121/122/123 Best Time to Buy and Sell Stock(股票) I/II/III----DP+Greedy** Objective-C的方法替換 MyBatis經由過程JDBC數據驅動生成的履行語句成績 SUNWEN教程之----C#進階3 mongoDB的索引 php生成圖形驗證碼幾種方法小結 sqlite-SQLite查詢報出異常

欄目導航