This article was first published in CSDN Programmed life (ID:coder_life)
From the industrial age to the information service age , The Internet industry is gradually emerging , Traditional industries are declining . Today, , Many people are on the Internet + Era , Everyone should learn programming has gradually been valued .
As an old bird who has been in the industry for five years , Look back and see the experience of these five years , Avoiding Mrs .
from PC The transformation from the Internet era to the mobile Internet era , That is, a few years ago .IT The fast pace of the world has become the norm . Small step run , Fast iteration of products , Mobile Internet has developed rapidly .
Now? , A faster and more convenient life experience , Are carried by a group “ To change the world ” People with dreams , Achieved by a group of painstaking programmers .
“ The programmer ” In the hearts of the public , There are many stereotyped labels : Dead house 、 mechanical 、 Work overtime 、 IQ is very high, EQ is very touching , geek 、 Plaid shirts all year round , There is also the famous sentence “ More money, less death ”.
These stereotypes , Yes, not all right , They are just one aspect of many features of programmers .
Such a group of people , They have high requirements for the performance of computers , Very keen on mechanical keyboard , Pursue earphones to the extreme , In addition to some of our common , They also care about 、 What else do you like .
The author will CSDN Programmed life In the document data of recent years , From a data perspective , Let's analyze this group of geeks , What's in my head .
This data source is CSDN Program life official account issued a document , The first step is to obtain the document data of official account in recent years . The content of official account is published on the official wechat platform , So only through wechat , To get the corresponding data .
Grab the bag
Packet capture is to intercept the data packets sent and received by network transmission 、 retransmission 、 Editing and other operations . ad locum , We need to use our own personal computers , Install the corresponding packet capturing tool to capture packets (Mac Recommended Charles,Windows Recommended Fiddler).
HTTPS
HTTPS Safety is the goal HTTP passageway ,HTTP Lower join SSL layer ,HTTPS The safety basis of SSL, So the details of encryption need to be SSL.HTTPS The emergence of has increased the difficulty of capturing bags , But it is not insoluble .
To solve this problem , The packet capturing tool will act as an intermediary agent , The mobile terminal communicates with the packet capturing tool , The packet capturing tool communicates with the server .
The mobile terminal and the packet capturing tool are established HTTPS The public key requested is sent to him by the packet capturing tool . therefore , In order to communicate normally , You need to install the root certificate generated by the packet capturing tool and trust it .
From the captured packets , You can analyze the request interface corresponding to the official account article list , You only need to send all data requests in pages , You can get the whole list of articles .
But like it ( good-looking ) The number and reading quantity are difficult to obtain . These two data are only available on wechat clients , Through the analysis of ( guess ), This request is triggered by wechat client and updated to the page , To get this data , Only stupid methods can be used , Let wechat open the article details page , Send a request , We store the data returned by the request , In association with the article title , Data can be used .
Automation can liberate both hands , So here we use AnyProxy+ADB Shell .
AnyProxy It's based on Node.js Of 、 Available for plug-in configuration HTTP/HTTPS proxy server . And the above Charles 、 Fiddler similar , But it is more suitable for developers .
ADB yes Android Development SDK One of the tools in , It can use the screen to click 、 Input 、 Sliding and other functions , This enables automatic clicking on the screen .
start-up AnyProxy, Set the specified plug-in JS The path of , And run ADB Script , Open the page automatically , This enables the plug-in to automatically send Request And Response The data is stored in the database file , The core code of the plug-in is as follows :
var url = require("url")
module.exports = {
*beforeSendResponse(requestDetail, responseDetail)
{
try {
var pathName = url.parse(requestDetail.url).pathname
if(pathName == "/mp/getappmsgext") {
saveReadCount(requestDetail, responseDetail)
}
} catch(err) {
console.log("err")
}
}
};
After a period of data capture , And the data is correlated , The data obtained locally include 2630 strip , Compared with thousands of movie review data , Although the quantity is not much , But it is enough for us to analyze some key information .
Like something on the Internet ( For example, a post 、 An article or a microblog ) Agree with 、 love . So let's take a look at the likes of the program life official account tweets TOP10:
Because the data is stored in the database , So just use simple SQL Query can get the data we want ,SQL The code is as follows :
select title as title , author as author , CAST(likeCount as int) as Number of likes from messages order by Number of likes DESC limit 10
Carry out the above SQL, The results are as follows :
You can see from the above picture that ,“ Changchun Changsheng ‘’ Most concerned , The number of likes is much higher than other articles , It can be seen that our little brother programmers , It is also a time to pay attention to major social events , be concerned about one 's country and one 's people .
Of course , Like is just one of the indicators , Not all . After all, many programmers are lazy to praise .
In addition to the number of likes , Let's take a look at the ranking of reading quantity , Here is the data printed out from the console , To better format the structure , Used PrettyTable, The code is as follows :
def getArticInfos(min, max):
conn = sqlite3.connect('wechat.db')
conn.text_factory = str
cursor = conn.cursor()
cursor.execute("select title, author, datetime, CAST(readCount as int) as read, CAST(likeCount as int) as like from messages where datetime > '2018-01-01 00:00:00' order by read desc")
values = cursor.fetchall()
table = PrettyTable(["Title", "Author", "Time", "Read Count", "Like Count"])
table.align["Title"] = "l"
table.align["Author"] = "l"
table.padding_width = 1
totalCount = 0
for item in values:
readCount = item[3]
if readCount >= min and readCount < max:
table.add_row([str(item[0]), str(item[1]), str(item[2]), str(item[3]), str(item[4])])
totalCount += 1
print table
print "Total Count:" + str(totalCount)
conn.close()
Here is a list from 7W+ To 10W+ List of articles :
We can see from the picture , Longquan Temple, the place inspired by the first version of Zhang Xiaolong's wechat needs, has attracted much attention , The negative news of programmers is also the focus of many programmers . Industry news and things related to them , More able to attract their attention .
Last , Let's look at it as a whole , In these articles , What high-frequency words will appear in the title . We inform you , Use Jieba Word segmentation , And use Matplotlib Generate word cloud , as follows :
From the picture , We can see ,“ Study ” It can be said to be a very high-frequency word , Of course , In all walks of life , Learning is a very important topic and skill .
Especially in IT industry , The update iteration is extremely fast , Even the knowledge I just learned two days ago , It will be out of date in two days .
therefore , Many programmers are very concerned about their personal growth , Learning is naturally essential . framework 、 frame 、 Guide is a better word to attract programmers .
Last and last , What I want to say is , Very think impassability , As a programmer , You don't have a girlfriend , you new Isn't an object over ? I have to go on a blind date ?