您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Identify twitter user gender through Python

編輯：Python

Resource download address ：https://download.csdn.net/download/sheziqiong/85705774

This is an introductory project , Used to understand

Text feature engineering ,
Image feature Engineering ,
Basic data cleaning process
Project modeling process

Data set basic information ：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20050 entries, 0 to 20049
Data columns (total 6 columns):
# Column Non-Null Count Dtype 
--- ------ -------------- -----
0 gender 19953 non-null object
1 description 16306 non-null object
2 link_color 20050 non-null object
3 profileimage 20050 non-null object
4 sidebar_color 20050 non-null object
5 text 20050 non-null object
dtypes: object(6)
memory usage: 940.0+ KB
None

The dataset has 20050 That's ok ,6 Column

Feature content :

gender： User's gender , That is, the prediction content
description： User self description
link_color： User theme colors
profileimage：twitter Avatar link
sidebar_color ： User sidebar color
text: user twitter Published content

Data preview ：

Process introduction ：

Data cleaning

1.1 according to 'gender' Columns filter data
1.2 To filter out 'description' Data whose column is empty
1.3 To filter out 'link_color' Column sum 'sidebar_color' Illegal column 16 Hexadecimal data
1.4 Clean text data
1.5 according to profileimage Link to determine whether the avatar image is valid ,
1.6 Replace male->0, female->1