您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

python爬蟲爬取抖音視頻的翻頁問題如何解決？

編輯：Python

各位大佬，求助。最近想嘗試利用python爬蟲，爬取某一抖音博主的全部視頻。
但是在爬取過程中遇到了一個問題，即翻頁問題。
在抖音中，最初打開抖音博主的主頁時，會自動加載20個左右的視頻及其鏈接。當鼠標滑到頁面末尾的時候，會自動加載後續的視頻內容。整個過程沒有翻頁的相關按鈕，是自動完成的後續內容加載。
但是在爬取的過程中，只會獲取到最初的20個視頻的鏈接，想請問，如何通過程序，自動完成“翻頁”並加載該抖音博主的全部視頻鏈接？
以下附我的代碼，希望各位大佬指點。

import requests
from bs4 import BeautifulSoup
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
'cookie': 'douyin.com',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
}
def get_share_url(url):
try:
r = requests.get(url=url, headers=headers, allow_redirects=False)
return r.headers['location']
except Exception as e:
print("解析失敗")
print(e)
def get_video_url(url):
if not url:
return
try:
url_new = 'https://www.douyin.com/user/MS4wLjABAAAAp3rtDfotN7-mHjDIr0XR2XJ5g0C1DIVAuJYgBHJYX-xJLZgoHvfN0r0yAWTLybn7'
r = requests.get(url=url_new, headers=headers)
html = BeautifulSoup(r.text, 'html.parser')
result = html.find_all('a', 'B3AsdZT9 chmb2GX8 UwG3qaZV')
for item in result:
get_video(item['href'])
except Exception as e:
print("解析失敗")
print(e)
def get_video(video_url):
vid = video_url[-19:]
print(vid)
xhr_url = f'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids={vid}'
r = requests.get(url=xhr_url, headers=headers).json()
video_url_a = r['item_list'][0]['video']['play_addr']['url_list'][0]
return
if __name__ == "__main__":
# 抖音APP分享的短鏈接
url = 'https://v.douyin.com/2Qj3mHB/'
share_url = get_share_url(url)
video_url = get_video_url(share_url)
print("Finished")

運行結果如下：

6980593719770680583
6991623262421732645
6982747467158687012
7122118979715386639
7122012417067945216
7110354925598559488
7095343365453090089
7081217800802405632
7064352556000038178
7054563255129296168
7047288931666119951
7045083352319134976
7039272094689070376
7038262801579642112
7034134213125213474
7025548403216059663
7023583223586229504
7021062565593763087
7016236019557092616
7011905311443275015
7010997973106674952
6996280723158207774
Finished

運行結果輸出的是該博主前22個視頻的id。而後面的十來個視頻的id則沒有加載。

采納答案1：

翻頁方式：