《 Desired life 》 It is a very warm life reality show variety show on Hunan Satellite TV , At present, the third season is being updated , Permanent guests joined Zhang Zifeng , Deeply loved by the audience . And the Douban score of the program also reached 7.9. This variety show takes star artists to experience life in the village as the main line , Integrated with delicious food , labour , Humorous elements , It makes people feel immersive while watching , It seems that they have really entered “ Desired life ”.
Yearning for life, Douban score
While watching the program these days , Seeing the lively discussion on the barrage , On a whim, can you climb down all the bullet screens for analysis . On the one hand, explore whether there is anything special about barrage data capture , On the other hand, through the bullet screen to find out the reputation of the program . Next, we'll update the page just last Friday 5 Period as an example , Capture barrage data . The code mainly uses requests library , The grab results are stored in csv In file .
In mango TV The web version opens page 5 Episode , Wait for the ad to load , Open at the same time chrome Developer Tools network tab . Because there are many requests , And over time , More and more . So I took the way of emptying first and then waiting . I found that most of the images loaded in front are pictures , Naturally, this is not our goal . After a while , Found a suspicious request , See figure below , Click to see , There really is a barrage of content .interval yes 60, Guess may mean an interval , Every time 60s There will be a new request . So using filter Filtered to “rdb” Initial request , It was found that these were bullet screens , and next All are 60000 Multiple , Guess means 60000 millisecond , That is to say 60 second .
Find the barrage request link
Filter barrage requests
Next, we need to confirm the flip logic of the barrage , That is, the unified law of these barrage Links . Here we recommend a good web request analysis tool postman. It can not only be used to analyze the parameters of web pages , It can also provide request codes in different languages , With a little modification, you can use . Post the link we just found to postman in . As shown in the figure , You can see the parameters of the request , Click on send After the button, you can see the result of the request . Due to many parameters , Consider removing some useless parameters . Finally found , Just keep vid,cid,time Three parameters are sufficient . guess vid Show id,cid Show video id,time It should be the moment of request , It's a relative value . And in the request result , And the time of each barrage , It's better than time Big numbers . Combined with the above analysis logic , It can be concluded that the result of each request is the request time 60s The barrage inside . If we want to get all the barrages , You can change time To achieve . The smallest time The value should be 0, The biggest one should be the one closest to the video duration 60000 Multiple milliseconds . The length of the program here is 89:49. After verification , Right enough , Next, we can implement it in code .
Use postman Test request parameters
Use postman test time Request parameters
Use requests Construct network request , And use a loop to control page turning , Climb all the barrages . Parse the returned json Data and use pandas Store in Excel in . The detailed code is as follows , altogether 45 That's ok .
import requests import pandas as pd import time import datetime from fake_useragent import UserAgent ua = UserAgent() url = "https://galaxy.bz.mgtv.com/rdbarrage" rdb_content = {'id': [], 'type': [], 'uid': [], 'content': [], 'add_time': [], 'ups': []} count = 0 print(" Crawl start time : {}".format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))) for i in range(0, 91): querystring = {"version": "2.0.0", "vid": "5683459", "cid": "328724", "time": i*60000} headers = { 'User-Agent': ua.random } try: response = requests.request("GET", url, headers=headers, params=querystring).json() items = response['data']['items'] if items is None: print(" Crawling over ! Number of barrages {}".format(count)) break else: for item in items: rdb_content['id'].append(item.get('id')) # bullet chat id rdb_content['type'].append(item.get('type')) # Barrage type rdb_content['uid'].append(item.get('uid')) # user id rdb_content['content'].append(item.get('content')) # The contents of the barrage rdb_content['add_time'].append(item.get('time')) # Barrage time rdb_content['ups'].append(item.get('up', 0)) #d Barrage likes count = count + 1 print(" Crawling {} Minutes of barrage ..., Current number of barrages {}".format(i + 1, count)) time.sleep(5) except: print(" The first {} Minute barrage crawl failed ! Current number of barrages {}".format(i + 1, count)) continue rdb_df = pd.DataFrame(rdb_content) rdb_df.to_csv('rdb.csv', index=None)
Screenshot of operation effect :
Running effect
It can be seen that , During this climb , The number of barrages is close to 3w strip , At this time, the program update is not yet 2 God , To a certain extent, it can reflect the popularity of the program . Next, let's do some in-depth analysis of the barrage data , From the perspective of data, this program .
The data crawled above , Some fields are missing , But the proportion is very small , Therefore, delete is adopted to deal with , Final surplus 28602 Valid data .
Data preprocessing - Delete duplicate values
01 Distribution of barrage number in different time periods
The duration of the program is about 90 minute , We are divided by 1 Minutes and 10 In minutes , Look at the number of barrages . It can be seen that , Though over time , The number of barrages fluctuates , But on the whole , At all times , The barrage does not fluctuate violently , It also reflects that the program can continue to maintain a high popularity , a “ Every minute is wonderful ”.
Number of barrages per minute column chart .png
Number of barrages per ten minutes column chart .png
02 The number distribution of barrages with different lengths
Column diagram of different barrage lengths .png
It can be seen that , The length of most barrages is concentrated in 10 Up and down , Tend to be colloquial . It's also in line with our perception ,10 Words or so are enough to express the user's mood and point of view . Of course, there are users who are not too troublesome , The number of barrages has reached 30 Words above , There are also a very small number of barrages with a length of 50 above . out of curiosity , We can see that the length exceeds 50 What did the barrage say , See figure below , How much can you feel that the audience is enjoying the program very carefully .
The length exceeds 50 bullet chat .png
03 The distribution of the likes of the barrage
The number range of likes .png
It can be seen that nearly a quarter of the barrage did not get praise . near 6 The amount of praise for the bullet screen is 20 following , Like 20 The above Barrage is less than 20%. We can also see that the praise is greater than 300 What did the barrage say , But from the bullet screen, we can feel the overall happy atmosphere of the program .
Like more than 300 bullet chat .png
04 Number of barrages released by users , Number of likes , Comparison of the total number of words in the bullet screen
There are... In our data 17268 Users posted 28602 Shrapnel , In descending order of likes, take the top 10, Observe the number of barrages , Number of likes , The total number of words in the barrage . It can be seen that , Users with high likes , The number of barrages released is also large , The number of words is also a lot .
Comparison of bullet screen situation of each user .png
05 Barrage use emoji Facial expression
bullet chat emoji Expression usage .png
06 Clouds of words
Through the word segmentation of the barrage , Draw the following word cloud .
The cloud picture of barrage words
Look at the cloud picture of this word , Instantly feel the joy of overflowing the screen , It seems that the ears can hear the music intermittently “ Ha ha ha ha ” The sound , The eyes of the masses are bright , A program that makes people so happy , It's not surprising that the fire rises .
thus , We've basically finished 《 Desired life 》 The first 5 The capture of bullet screen and simple visual analysis of this program . More interesting points can be analyzed and found by yourself . Originally, I also called Baidu's Emotional Analysis API, I want to analyze the emotional tendency of the barrage , But the effect doesn't seem to be very good , As a result, it didn't post .