您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python crawler (III) crawling JS dynamic page for the number of fans in station B, watching, liking and crawling

編輯：Python

Preface

Learn by yourself python Our original intention is to support micropython On chip programming , For example, early exposure openmv, And having arm linux Environmental raspberry pie （ Raspberry pie python No micropython） There is also the contact behind esp32 and esp8266 Write program on . So I entered python Programming self-study ranks , After a while, I was in b After watching the video of little turtle, I know almost my basic knowledge , Then I want to apply it to practice . So I contacted python The most classic thing “ Reptiles ”, When I first learned to crawl, I was mooc I learned from teacher Songtian , Talking about requests and bs4 A lot of reptiles . I have made progress slowly , But I want to rely on reptiles to go whoring for nothing “ Baidu library ” When I met the wall for the first time , The content you crawl is empty . It's strange , Obviously I pressed F12 Then you can see the text in html How can the crawler crawl in the back to get empty ？ Later, baidu found several times that its page was through js Rendered , and requests What the library can get is the original html page , So it can only be settled . Times have changed , Later I was going to use esp32 Inside micropython To make a up When the main point likes the concerned display machine , This problem comes up again , I also know that Baidu can find it after Baidu api, But this time I want to analyze this for myself api, It also draws a short stop for my reptile learning

Method 1 ：selenium library

About this selenium library , Is a web page automation library , Personal understanding is a useful library for web script , Here, if you use it as a crawler, you can get the rendered results , it pip After downloading, I have to install the corresponding web driver. For example, I use the driver of Google browser , Here is a blog post windows Installation in environment selenium What I want to say here is that I need to look at the version of my browser and download the corresponding driver ,chrome Is in 【 help 】->【 About Google chrome】 Inside

This is my version .
Right click where you want to crawl and copy Xpath That's it

I will post the code later

from selenium import webdriver
Browser = webdriver.Chrome()
Browser.get("https://space.bilibili.com/442752399?from=search&seid=17757741653586589473")
content = Browser.find_element_by_xpath('//*[@id="navigator"]/div/div[1]/div[3]/a[2]')
print(" Number of fans ：",content.get_attribute("title"),content.text)
Browser.close()

Method 2 ： Analysis shows that api

In fact for micropython It's not python It doesn't have that rich library , But the most basic requests The library still has （micropython Call in urequests library ） Then we can only catch such a basic library api 了 , So it is necessary to analyze the web packet .
Press down F12 First of all, I became interested in the number of fans 12243 This number, if there are packets , The packet must have this number （ The premise is that no encryption processing is performed ）, Then I used to go network Inside XHR look for ,

As a result, I looked around and couldn't find , And then came ,Sources Search for this number in

Finally found here , But when I right click open in new tab When

The result is no access to , Final analysis url After me i Choose Delete &callback=__jp4 After discovery, the system successfully returns json

So far its api It was analyzed , The source code is posted below

import requests
uid = input('please input your uid:')
url = 'https://api.bilibili.com/x/relation/stat?vmid=%s&jsonp=jsonp'%(uid)
r = requests.get(url)
print(r.json()["data"]["follower"])

summary

These are my two methods , The first fit pc, But in contrast 、 It may be very limited . The second is to capture the source data url, Here we can get the corresponding url After the url After analysis, get the real url, Although it is very simple, I have written my own mental journey , It can be regarded as a way to avoid detours for later people .