After analyzing the quality of the data , Next, we can analyze and calculate the characteristics of the data , You can also draw charts to show the characteristics of the data . There are several ways to analyze the characteristics of data : Distribution analysis 、 comparative analysis 、 Statistical analysis 、 Periodic analysis 、 Contribution analysis ( Pareto analysis )、 correlation analysis 、 Normality test .
Distribution analysis can reveal the distribution characteristics and types of data .
Let's use a specific example to demonstrate how to Conduct quantitative and qualitative distribution analysis . Here are the test data to be used . This is a digital camera in 1998 Sales order data for the whole year . Here's the front 10 Data :
Quantitative analysis of data , The most common way to show its distribution is histogram (Histogram). This kind of diagram is also called mass distribution diagram , It's a statistical report chart , Data distribution is represented by a series of vertical stripes or line segments with different heights . Generally, the horizontal axis is used to represent the data type , The vertical axis shows the distribution .
The histogram can be drawn according to the following steps :
data = pd.read_csv("/root/data/ Digital camera order data .csv")
dr = data' Order amount '.max() - data' Order amount '.min()
print(" The order amount range is :",dr)
df = pd.DataFrame({"datetime":data" The order time ","amount":data" Order amount "})
df'datetime' = pd.to_datetime(df'datetime')
df'month' = df'datetime'.dt.month.fillna(0).astype("int")
result = df.groupby('month').sum('amount')
# Output data distribution square table
print(result)
result.plot(kind='bar')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.show() The histogram drawn is as follows .
Qualitative analysis of data is often grouped according to the classification type of variables , The most common way to show its distribution is pie chart or bar chart to describe the distribution of qualitative variables . for example , The pie chart shows the ratio of the size of the items in a data series to the sum of the items . The data points in the pie chart are displayed as a percentage of the whole pie chart .
The following is an example of pie chart , Just add the last generated in the histogram code above DataFrame(result) Draw... Directly using pie charts , that will do . As shown below .
# Draw a pie chart of data result.plot.pie(subplots=True,figsize=(11, 11)) plt.show() # Be careful : The pie chart here is based on 1 month ~12 Classified by month .
The pie chart drawn is as follows .