您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[Python] GDP analysis and forecast

編輯：Python

Bowen author wangzirui32
Like can give the thumbs-up Collection Pay attention to ~~
My number 162 An original work
This article was first published in CSDN, Reprint is prohibited without permission
hello, Hello everyone , I am a wangzirui32, Today we are going to learn how to analyze and forecast the GDP , Start learning ！

1. Data sources

The data covers 2010-2021 Gross domestic product csv file , come from National bureau of statistics Official data ：

Select the time in turn （2010-2021）, download CSV file , Named as 2010.csv,2011.csv（ year +.csv）, give the result as follows ：

2. Cleaning data file

2010 The contents of the data file for the year are as follows ：

You can see , The real data is in the... Of the file 3 From the beginning of the line to the penultimate 6 That's ok , We need to extract the data , And carry out gbk Encoding conversion （ The source file is gbk code , Demand is UTF8 code ）, Put the data file into datafiles In the folder , Then create a directory on the upper level of this directory Python file collate_data.py, Write code ：

import os
import codecs
for i in os.listdir("datafiles"):
path = "datafiles/{}".format(i)
try:
# Put the data file gbk Code to utf-8
with codecs.open(path, "rb", "gb2312") as f:
content = f.read()
with codecs.open(path, "wb", "utf-8") as f:
f.write(content)
except: pass
# The introduction information of the data file is removed 
with codecs.open(path, "rb", "utf-8") as f:
new_content = f.readlines()[2:-5]
with codecs.open(path, "wb", "utf-8") as f:
f.writelines(new_content)

Execute this code , The data is cleaned .

3. analysis

With the help of pandas Reading data ,matplotlib Draw statistical chart for analysis , Install library command ：

pip install pandas matplotlib

3.1 Broken line chart

The code is as follows ：

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_Q3 = [] # The third quarter gdp
gdp_Q4 = [] # In the fourth quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_Q3.append(df[' The third quarter '][0])
gdp_Q4.append(df[' In the fourth quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
Q3_line, = plt.plot(years, gdp_Q3, color="green")
Q4_line, = plt.plot(years, gdp_Q4, color="orange")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP （ One hundred million yuan ）")
plt.xticks(years)
plt.legend([Q1_line, Q2_line, Q3_line, Q4_line, all_year_line],
[' first quarter ',' The second quarter ', ' The third quarter ', ' In the fourth quarter ', ' Annual total value '],
loc='upper right')
plt.show()

The effect is as follows ：

3.2 Column statistical chart

The code is as follows ：

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
# Extract the data 
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
plt.bar(years, gdp_all_year, width=0.5, label="numbers")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis ", loc="center")
plt.xlabel(" year ", fontsize=14)
plt.ylabel(" GDP （ One hundred million yuan ）", fontsize=14)
plt.show()

The effect is as follows ：

You can see ,2020 Because of the epidemic , The gross domestic product has declined , But in recent years, the overall situation is still on the rise .

4. Fit the linear regression equation

We will use sklearn Machine learning library to fit linear regression equation , Its installation command is as follows ：

pip install scikit-learn

4.1 By the end of 1 Quarter as a parameter

Ideas as follows , We use sklearn Library fitting linear regression equation , By the end of 1 Quarter as a parameter , The equation that generates the prediction , The code is as follows ：

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
from sklearn import linear_model
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP （ One hundred million yuan ）")
plt.xticks(years)
# Formulate the equation 
# Create a linear regression model 
model = linear_model.LinearRegression()
model.fit(list(zip(gdp_Q1)), gdp_all_year)
# Acquisition coefficient 
coef = model.coef_
# Acquisition intercept 
intercept = model.intercept_
# equation 
equation = "y = x*{} + {}".format(coef[0], intercept)
print(" Linear regression equation :", equation)
# Calculate equation data 
forecast_value = [i*coef[0]+intercept for i in gdp_Q1]
# Draw equation polyline 
forecast_line, = plt.plot(years, forecast_value, color="green")
plt.legend([Q1_line, Q2_line, all_year_line, forecast_line],
[' first quarter ',' The second quarter ', ' Annual total value ', ' Equation simulation '],
loc='upper right')
plt.show()

The effect is as follows ：

You can see , Equation basic fitting curve , But because 2020 The GDP fell in the first quarter of the year , Furthermore, the predicted annual data also produced a large gap , So what do we do ？

4.2 By the end of 1,2 Quarter as a parameter

We can take the first 1,2 Quarter as a parameter , The code is as follows ：

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
from sklearn import linear_model
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP （ One hundred million yuan ）")
plt.xticks(years)
# Formulate the equation 
model = linear_model.LinearRegression()
model.fit(list(zip(gdp_Q1, gdp_Q2)), gdp_all_year)
coef = model.coef_
intercept = model.intercept_
equation = "y = x1*{} + x2*{} + {}".format(coef[0], coef[1], intercept)
print(" Linear regression equation :", equation)
# Calculate the predicted results 
forecast_value = [i[0]*coef[0]+i[1]*coef[1]+intercept for i in list(zip(gdp_Q1, gdp_Q2))]
forecast_line, = plt.plot(years, forecast_value, color="green")
plt.legend([Q1_line, Q2_line, all_year_line, forecast_line],
[' first quarter ',' The second quarter ', ' Annual total value ', ' Equation simulation '],
loc='upper right')
plt.show()

The effect is as follows ：

You can see , The fitting result of this equation is very good , It can be used as a prediction equation .
The linear regression equation is ：

y = x1*0.20405068090604006 + x2*3.8656156020304238 + 9671.424027125235

Equivalent to ：

 GDP = GDP in the first quarter *0.20405068090604006 + GDP in the second quarter *3.8656156020304238 + 9671.424027125235

This completes the whole process of analysis and prediction .

Okay , That's all for today's lesson , I am a wangzirui32, You can collect and pay attention to what you like , See you next time ！

上一篇文章： Python tkinter - 第四章標簽控件（Label）
下一篇文章： Python tkinter - 第7章框架控件(Frame 與 LabelFrame)

Python

Python——列表中元組依次取第一個元素（值）

參考文章：python 列表中元組依次取第一個值如何從嵌套的

一定要會的Python爬蟲技能，搜索名字即可下載全文小說？

前言

Python + Appium框架原生代碼實現App自動化測試

Step1：首先介紹下python+appium的框架結構如

Python

Preface &

[Python ticket booking system] this is the correct way to choose seats for movies. It turns out that we were all cheated ~ (choose seats and buy tickets with one click)

Introduction Hello, hello ！ I

Django應用啟動時如何執行一波初始化添加數據？只需要簡單的三步

需求：希望在運行項目前進行一些數據初始化，包括但不限於一些

Python calculates the area and perimeter of a circle. Analysis of the real problem of level 2 of the python programming level examination of the Electronic Society for youth programming march2021

Python data analysis - pandas data structure (dataframe)

Python data analysis science library pandas (statistical analysis and decision)

Actual combat analysis of Python project

[principle + code] Python implements TOPSIS analysis method (good and bad solution distance method)

Python data analysis 10 -- using Matplotlib to draw 3D graphs

Python data analysis 9 - Matplotlib configuration item

The most powerful Python data analysis tool! Jupyter notebook has this function!

10 useful Python code snippets for date time analysis in Python tutorial

Data analysis pandas series, dataframe knowledge points mind map

熱門圖文

Conversion binaire Python Repeater的FooterTemplate中控件內容設置方法使用Sphinx對索引進行搜索 Python計算平均值基於JXL的Excel數據導入工具 uva 11426 - GCD - Extreme (II)(數論) CodeIgniter表單驗證方法實例詳解怎麼安裝Altas(.net Ajax)

欄目導航