Catalog
One 、 Code organization and execution sequence 3
Two 、 Problem description 3
3、 ... and 、 feature extraction 4
( One )dat_risk 4
( Two )dat_symbol 4
( 3、 ... and )dat_app 6
( Four )dat_edge 7
Four 、 Feature extraction of user association graph 13
( One ) Centrality class features 13
( Two )Louvain Community clustering 13
5、 ... and 、 Tag spread 14
( One ) Once a contact 14
( Two ) Second degree contact 15
( 3、 ... and ) Once routed 17
6、 ... and 、 Characteristic propagation 17
( One ) Feature Overview 18
( Two ) Data description 18
( 3、 ... and ) feature extraction 18
( Four ) Characteristic evaluation 18
7、 ... and 、 Data summary and characteristic comparison 18
8、 ... and 、 Feature sorting and discretization 19
Nine 、 Adjustable parameter 19
Ten 、 Test set AUC The mystery of decline 21
( One ) user app The difference in the missing rate of data 21
( Two ) The degree to which the user's associated data is closely related 21
11、 ... and 、 Improved space 22
One 、 Code organization and execution sequence
Two 、 Problem description
The explanation here is to reach a consensus , At the same time, it is also to unify the symbolic representation , The following explanations are based on this . The data about user characteristics is divided into four parts ( Sort according to the difficulty of handling , From easy to difficult ):
•(1)dat_risk
•(2)dat_symbol
•(3)dat_app
•(4)dat_edge
About user tags and training sets 、 Verification set 、 Test set data : (1)sample_train( Two : id、label) (2)valid_id( A column of : id) (3)test_id( A column of : id)
hold sample_train、valid_id、test_id Of id Splice up , Get everything id, The data name is recorded as all_id, The data format is a 28959*1 Of DataFrame.
3、 ... and 、 feature extraction
( One )dat_risk
Output is all_id_dat_risk
hold dat_risk and all_id Make internal connections :
all_id_dat_risk = pd.merge(all_id, dat_risk, on=‘id’, how=‘inner’)
The aim is to find out all_id Every one of them id Characteristics of , Of course there are id Probably not , In the final data, it is shown as missing value .
Reprinted from :http://www.biyezuopin.vip/onews.asp?id=16293