您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python data analysis (I): import data, descriptive statistics, cross analysis, correlation analysis, linear regression analysis

編輯：Python

Catalog

- - 1 First import some packages
  - 2 Import data
  - - （1） from excel Table import
  - 3 Create data manually
  - 4 Data sorting
  - 5 Simple calculation of data
  - 6 On data 0-1 Standardization
  - 7 Basic descriptive statistical indicators
  - 8 Grouping statistics
  - 9 Correlation analysis
  - 10 Draw a scatter plot
  - 11 linear regression model

1 First import some packages

The data analysis of this paper is in anaconda Medium Spyder In the middle of .

import pandas
from sklearn.linear_model import LinearRegression
import matplotlib
import matplotlib.pyplot as plt

2 Import data

（1） from excel Table import

The screenshot below is data.xlsx Data in ,sheet Name data1.

Execute the following code

# utilize pandas Inside read_excel function 
# Pay attention to two places , First, write the file path （ Include the file name ）
# Second, write which one in the import file sheet
data = pandas.read_excel(
'D:/7_science_and_technology/ Data analysis /data.xlsx',
sheet_name='data1'

give the result as follows ：

3 Create data manually

# utilize pandas Inside DataFrame Manually create 
# ' Variable name '：[...,...,...,...,]
data_2 = pandas.DataFrame({

'catalog': ['A','B','C','D','E'],
'percent': [0.1, 0.15, 0.4, 0.6, 0.9]
})

give the result as follows ：

utilize plot.bar Function draw a histogram ：

data_2.plot.bar(x = 'catalog', y='percent')

give the result as follows ：

4 Data sorting

# True Stands for ascending order ,False For descending order 
sortData = data.sort_values(
by = [' Math scores ',' Chinese achievement '],
ascending = [True, False]
)

give the result as follows ：

5 Simple calculation of data

# Simple calculation of data 
data[' Total score '] = data. Math scores + data. Chinese achievement

give the result as follows ：

6 On data 0-1 Standardization

# Data pair 0-1 Standardization 
data[' Chinese achievement standardization '] = round(
(data. Chinese achievement - data. Chinese achievement .min())/(
data. Chinese achievement .max() - data. Chinese achievement .min())
)

give the result as follows ：

7 Basic descriptive statistical indicators

# Basic description statistics 
print(data. Total score .describe())

give the result as follows ：

8 Grouping statistics

# Group statistics by sex 
ga = data.groupby(by = [' Gender '])[' Chinese achievement '].agg('count')
print(ga)
print(ga.sum()) # The total number of cases 
print(ga/ga.sum()) # Calculation scale

give the result as follows ：

9 Correlation analysis

# Correlation analysis ： Chinese achievement 、 Math scores 
corrMatrix = data[[
' Math scores ',' Chinese achievement '
]].corr()
print(corrMatrix)

give the result as follows ：

10 Draw a scatter plot

# Draw a scatter plot 
#data.plot(' Math scores ',' Chinese achievement ', kind = 'scatter')
plt.scatter(data[' Math scores '], data[' Chinese achievement '])

give the result as follows ：

11 linear regression model

# The regression model 
x = data[[' Math scores ']]
y = data[[' Chinese achievement ']]
lrModel = LinearRegression()
lrModel.fit(x, y)
print(lrModel.coef_)
print(lrModel.intercept_)
# Accuracy of regression model 
print(lrModel.score(x, y))