程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data processing course design - housing price forecast - Code

編輯:Python

The code part is as follows :

#!/usr/bin/env python
# coding: utf-8
# ## Guide pack 
# In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pandas_profiling as ppf # Exploratory data analysis (EDA)
import warnings## Ignore the warning 
warnings.filterwarnings('ignore')
get_ipython().run_line_magic('matplotlib', 'inline')
plt.style.use('ggplot')
# In[2]:
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn.preprocessing import LabelEncoder# Tag code 
from sklearn.preprocessing import RobustScaler, StandardScaler# Remove outliers and standardize data 
from sklearn.pipeline import Pipeline, make_pipeline# Build the pipeline 
from scipy.stats import skew# skewness 
from sklearn.impute import SimpleImputer
# ## Read and view the original data 
# In[3]:
train = pd.read_csv(r"G:\study class \ Junior year \ machine learning \ curriculum design \datas\train.csv") # Read the data in 
# In[4]:
test = pd.read_csv(r"G:\study class \ Junior year \ machine learning \ curriculum design \datas\test.csv") # Read the data in 
# In[5]:
train.head()# The first five lines are displayed by default 
# In[6]:
test.head()
It can be seen that there is still a lot of data to be processed
# ## Data exploratory analysis pandas_profiling
# In[7]:
ppf.ProfileReport(train)
# In[8]:
train.YearBuilt# Show the data in this column 
# In[9]:
train.SalePrice
# ## Check the abnormal value through the box diagram , outliers 
# In[10]:
plt.figure(figsize=(12,8))
sns.boxplot(train.YearBuilt, train.SalePrice)
# ## Observe the relationship between lines through the scatter diagram 
# In[11]:
plt.figure(figsize=(12,6))
plt.scatter(x=train.TotalBsmtSF, y=train.SalePrice)
plt.xlabel("TotalBsmtSF", fontsize=13)
plt.ylabel("SalePrice", fontsize=13)
plt.ylim(0,800000)
# In[12]:
train.drop(train[(train["TotalBsmtSF"]>5000)].index,inplace=True)
# In[13]:
plt.figure(figsize=(12,6))
plt.scatter(x=train.TotalBsmtSF, y=train.SalePrice)
plt.xlabel("TotalBsmtSF", fontsize=13)
plt.ylabel("SalePrice", fontsize=13)
plt.ylim(0,800000)
# In[14]:
plt.figure(figsize=(12,6))
plt.scatter(x=train.GrLivArea, y=train.SalePrice)
plt.xlabel("GrLivArea", fontsize=13)
plt.ylabel("SalePrice", fontsize=13)
plt.ylim(0,800000)
# ## Take out those data that deviate too much from linearity , Delete the corresponding index 
# In[15]:
train.drop(train[(train["GrLivArea"]>4000)&(train["SalePrice"]<300000)].index,inplace=True)
Deleted image
# In[16]:
plt.figure(figsize=(12,6))
plt.scatter(x=train.GrLivArea, y=train.SalePrice)
plt.xlabel("GrLivArea", fontsize=13)
plt.ylabel("SalePrice", fontsize=13)
plt.ylim(0,800000)
# ### hold test The data is processed in the same way 
# In[17]:
full = pd.concat([train,test],ignore_index=True)
# ### because ID The column and index values are the same , So here ID List and delete 
# In187]:
full.drop("Id",axis=1,inplace=True)
# In[19]:
full.head()# View the value after deleting the column 
# In[20]:
full.info()# View the deleted data information 
# # Data cleaning -- Filling of null values 、 Delete 
# #### View missing values , And the number of missing items should be sorted from high to low 
# In[21]:
miss = full.isnull().sum() # Count the number of null values pd.set_option('display.max_rows', None) 
# In[22]:
miss[miss>0]
# In[23]:
miss[miss>0].sort_values(ascending=True) # Sort from low to high 
# In[24]:
full.info() # View data information 
# ## Filling and deleting null values 
# Fill in the character type 
# In[25]:
cols1 = ["PoolQC" , "MiscFeature", "Alley", "Fence", "FireplaceQu", "GarageQual", "GarageCond", "GarageFinish", "GarageYrBlt", "GarageType", "BsmtExposure", "BsmtCond", "BsmtQual", "BsmtFinType2", "BsmtFinType1", "MasVnrType"]
for col in cols1:
full[col].fillna("None",inplace=True)
# In[26]:
full.head()
# Fill the numeric type 
# In[27]:
cols=["MasVnrArea", "BsmtUnfSF", "TotalBsmtSF", "GarageCars", "BsmtFinSF2", "BsmtFinSF1", "GarageArea"]
for col in cols:
full[col].fillna(0, inplace=True)
# Yes lotfrontage The null value of is filled with its mean value 
# In[28]:
full["LotFrontage"].fillna(np.mean(full["LotFrontage"]),inplace=True)
# Fill the following columns with modes 
# In[29]:
cols2 = ["MSZoning", "BsmtFullBath", "BsmtHalfBath", "Utilities", "Functional", "Electrical", "KitchenQual", "SaleType","Exterior1st", "Exterior2nd"]
for col in cols2:
full[col].fillna(full[col].mode()[0], inplace=True)
# Check if there is any unfilled data 
# In[30]:
full.isnull().sum()[full.isnull().sum()>0]
Only found test No label columns for , Therefore, the null value in the data has been processed
# ## Data preprocessing -- Change characters into numeric type 
# In[31]:
full["MSZoning"].mode()[0]
# In[32]:
pd.set_option('display.max_rows', None) # Set the maximum number of rows to display , Otherwise, some data will be “...” Show , Some data cannot be seen 
full.MSZoning
Some data can be found from the above , such as 31 That's ok :C(all), You need to convert this data into a string , Convert some digital features into category features , Use LabelEncoder To achieve
# In[33]:
for col in cols2:
full[col]=full[col].astype(str)##astype To convert data to string type 
# In[34]:
lab = LabelEncoder() # Numbering discontinuous numbers or text 
# #### Convert the following contents from character type to number type 
# In[35]:
full["Alley"] = lab.fit_transform(full.Alley)
full["PoolQC"] = lab.fit_transform(full.PoolQC)
full["MiscFeature"] = lab.fit_transform(full.MiscFeature)
full["Fence"] = lab.fit_transform(full.Fence)
full["FireplaceQu"] = lab.fit_transform(full.FireplaceQu)
full["GarageQual"] = lab.fit_transform(full.GarageQual)
full["GarageCond"] = lab.fit_transform(full.GarageCond)
full["GarageFinish"] = lab.fit_transform(full.GarageFinish)
full["GarageYrBlt"] = full["GarageYrBlt"].astype(str)
full["GarageYrBlt"] = lab.fit_transform(full.GarageYrBlt)
full["GarageType"] = lab.fit_transform(full.GarageType)
full["BsmtExposure"] = lab.fit_transform(full.BsmtExposure)
full["BsmtCond"] = lab.fit_transform(full.BsmtCond)
full["BsmtQual"] = lab.fit_transform(full.BsmtQual)
full["BsmtFinType2"] = lab.fit_transform(full.BsmtFinType2)
full["BsmtFinType1"] = lab.fit_transform(full.BsmtFinType1)
full["MasVnrType"] = lab.fit_transform(full.MasVnrType)
full["BsmtFinType1"] = lab.fit_transform(full.BsmtFinType1)
# In[36]:
full.head()
# Continue to convert some unconverted columns to numeric 
# In[37]:
full["MSZoning"] = lab.fit_transform(full.MSZoning)
full["BsmtFullBath"] = lab.fit_transform(full.BsmtFullBath)
full["BsmtHalfBath"] = lab.fit_transform(full.BsmtHalfBath)
full["Utilities"] = lab.fit_transform(full.Utilities)
full["Functional"] = lab.fit_transform(full.Functional)
full["Electrical"] = lab.fit_transform(full.Electrical)
full["KitchenQual"] = lab.fit_transform(full.KitchenQual)
full["SaleType"] = lab.fit_transform(full.SaleType)
full["Exterior1st"] = lab.fit_transform(full.Exterior1st)
full["Exterior2nd"] = lab.fit_transform(full.Exterior2nd)
# In[38]:
full.head()
# #### Some columns are found to be character type , Failed to fully convert to numeric 
# In[39]:
full.drop("SalePrice",axis=1,inplace=True)## Delete this column , For later operation 
# #### It can be seen from the results , Rows and columns have become a lot 
# #### You can see that all the data is displayed as digital 
# In[40]:
## Write a conversion function by yourself 
class labelenc(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self,X,y=None):
return self
def transform(self,X):
lab=LabelEncoder()
X["YearBuilt"] = lab.fit_transform(X["YearBuilt"])
X["YearRemodAdd"] = lab.fit_transform(X["YearRemodAdd"])
X["GarageYrBlt"] = lab.fit_transform(X["GarageYrBlt"])
X["BldgType"] = lab.fit_transform(X["BldgType"])
return X
# In[41]:
# Write a conversion function 
class skew_dummies(BaseEstimator, TransformerMixin):
def __init__(self,skew=0.5):# skewness 
self.skew = skew
def fit(self,X,y=None):
return self
def transform(self,X):
X_numeric=X.select_dtypes(exclude=["object"])# Instead, it removes the inclusion of object data types , Most of them are numerical type , Get the data of character type 
skewness = X_numeric.apply(lambda x: skew(x))# Anonymous functions , In the form of a dictionary 
skewness_features = skewness[abs(skewness) >= self.skew].index# Choose by condition skew>=0.5 Conditions for indexing , Got all the data , Prevent data loss 
X[skewness_features] = np.log1p(X[skewness_features])# Find logarithm , Let's make it more consistent with the normal distribution 
X = pd.get_dummies(X)## One key alone , Hot coding alone 
return X
# In[42]:
from scipy.stats import norm
from scipy import stats
def get_dist_data(series):
sns.distplot(series, fit=norm);
fig = plt.figure()
res = stats.probplot(series, plot=plt)
# Price 
print("Skewness: %f" % series.skew())
print("Kurtosis: %f" % series.kurt())
get_dist_data(train_df['SalePrice'])
# In[43]:
# Logarithm of price 
log_SalePrice = np.log(train_df['SalePrice'] + 1)
get_dist_data(log_SalePrice)
# In[44]:
# Logarithmic processing 
plot_no = 0
plt.figure(figsize=(18, 60))
for feature in skewed.index:
plt.subplot(12, 4, plot_no + 1)
sns.distplot(df[feature], kde = True, fit=norm, color = "purple")
plt.title("Before", fontsize = 20)
plt.subplot(12, 4, plot_no + 2)
sns.distplot(df_temp[feature], kde = True, fit=norm, color = "green")
plt.title("After", fontsize = 20)
plot_no += 2
plt.tight_layout()
# In[45]:
# Build the pipeline 
pipe = Pipeline([# Build the pipeline 
('labenc', labelenc()),
('skew_dummies', skew_dummies(skew=2)),
])
# In[46]:
# Save the original data for future use , To prevent mistakes 
full2 = full.copy()
# In[47]:
pipeline_data = pipe.fit_transform(full2)
# In[48]:
pipeline_data.shape
# In[49]:
pipeline_data.head()
# In[50]:
from sklearn.linear_model import Lasso # Use algorithms to train to get the importance of features 
lasso=Lasso(alpha=0.001)
lasso.fit(X_scaled,y_log)
# In[51]:
FI_lasso = pd.DataFrame({
"Feature Importance":lasso.coef_}, index=pipeline_data.columns) # Index and importance dataframe form 
# In[52]:
FI_lasso.sort_values("Feature Importance",ascending=False)# Sort from high to low 
# In[53]:
# visualization 
FI_lasso[FI_lasso["Feature Importance"]!=0].sort_values("Feature Importance").plot(kind="barh",figsize=(15,25))#barh: hold x,y Axis reversal 
plt.xticks(rotation=90)
plt.show()# Draw a picture to show 
# ## After getting the feature importance graph, you can select and redo the feature 
# In[54]:
class add_feature(BaseEstimator, TransformerMixin):# Defining conversion functions 
def __init__(self,additional=1):
self.additional = additional
def fit(self,X,y=None):
return self
def transform(self,X):
if self.additional==1:
X["TotalHouse"] = X["TotalBsmtSF"] + X["1stFlrSF"] + X["2ndFlrSF"]
X["TotalArea"] = X["TotalBsmtSF"] + X["1stFlrSF"] + X["2ndFlrSF"] + X["GarageArea"]
else:
X["TotalHouse"] = X["TotalBsmtSF"] + X["1stFlrSF"] + X["2ndFlrSF"]
X["TotalArea"] = X["TotalBsmtSF"] + X["1stFlrSF"] + X["2ndFlrSF"] + X["GarageArea"]
X["+_TotalHouse_OverallQual"] = X["TotalHouse"] * X["OverallQual"]
X["+_GrLivArea_OverallQual"] = X["GrLivArea"] * X["OverallQual"]
X["+_oMSZoning_TotalHouse"] = X["oMSZoning"] * X["TotalHouse"]
X["+_oMSZoning_OverallQual"] = X["oMSZoning"] + X["OverallQual"]
X["+_oMSZoning_YearBuilt"] = X["oMSZoning"] + X["YearBuilt"]
X["+_oNeighborhood_TotalHouse"] = X["oNeighborhood"] * X["TotalHouse"]
X["+_oNeighborhood_OverallQual"] = X["oNeighborhood"] + X["OverallQual"]
X["+_oNeighborhood_YearBuilt"] = X["oNeighborhood"] + X["YearBuilt"]
X["+_BsmtFinSF1_OverallQual"] = X["BsmtFinSF1"] * X["OverallQual"]
X["-_oFunctional_TotalHouse"] = X["oFunctional"] * X["TotalHouse"]
X["-_oFunctional_OverallQual"] = X["oFunctional"] + X["OverallQual"]
X["-_LotArea_OverallQual"] = X["LotArea"] * X["OverallQual"]
X["-_TotalHouse_LotArea"] = X["TotalHouse"] + X["LotArea"]
X["-_oCondition1_TotalHouse"] = X["oCondition1"] * X["TotalHouse"]
X["-_oCondition1_OverallQual"] = X["oCondition1"] + X["OverallQual"]
X["Bsmt"] = X["BsmtFinSF1"] + X["BsmtFinSF2"] + X["BsmtUnfSF"]
X["Rooms"] = X["FullBath"]+X["TotRmsAbvGrd"]
X["PorchArea"] = X["OpenPorchSF"]+X["EnclosedPorch"]+X["3SsnPorch"]+X["ScreenPorch"]
X["TotalPlace"] = X["TotalBsmtSF"] + X["1stFlrSF"] + X["2ndFlrSF"] + X["GarageArea"] + X["OpenPorchSF"]+X["EnclosedPorch"]+X["3SsnPorch"]+X["ScreenPorch"]
return X
# In[55]:
pipe = Pipeline([# Add the back to the pipe 
('labenc', labelenc()),
('add_feature', add_feature(additional=2)),
('skew_dummies', skew_dummies(skew=4)),
])
# In[56]:
pipe
# In[57]:
n_train=train.shape[0]# Number of rows in the training set 
X = pipeline_data[:n_train]# Take out the training set after processing 
test_X = pipeline_data[n_train:]# Take out n_train The data is used as the test set 
y= train.SalePrice
X_scaled = StandardScaler().fit(X).transform(X)# Do the conversion 
y_log = np.log(train.SalePrice)## What we should pay attention to here is , More in line with the normal distribution 
# Get the test set 
test_X_scaled = StandardScaler().fit_transform(test_X)
# ## Model construction 
# #### Linear regression 
# In[58]:
from sklearn.tree import DecisionTreeRegressor# Import model 
# In[59]:
model = DecisionTreeRegressor()
# In[60]:
model1 =model.fit(X_scaled,y_log)
# ## The results are obtained through relatively simple treatment in the early stage , No stacking of models 
# In[61]:
#predict = modexp.predict(test_x)
# In[62]:
# result=pd.DataFrame({'Id':test.Id, 'SalePrice':predict})
# result.to_csv("submission1.csv",index=False)
# In[63]:
# predict = np.exp(model1.predict(test_X_scaled))#np.exp Is the inverse transformation of the above logarithmic transformation 
# In[64]:
# result=pd.DataFrame({'Id':test.Id, 'SalePrice':predict})
# result.to_csv("submission.csv",index=False)
# ## Stack and integrate models and select optimal parameters , Models and evaluation methods 
# In[65]:
from sklearn.model_selection import cross_val_score, GridSearchCV, KFold# Cross validation , The grid search ,k Fold validation 
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, ExtraTreesRegressor
from sklearn.svm import SVR, LinearSVR
from sklearn.linear_model import ElasticNet, SGDRegressor, BayesianRidge
from sklearn.kernel_ridge import KernelRidge
from xgboost import XGBRegressor
# In[66]:
# Define the strategy for cross validation , And the evaluation function 
def rmse_cv(model,X,y):
rmse = np.sqrt(-cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=5))# Cross validation 
return rmse
# In[67]:
models = [LinearRegression(),Ridge(),Lasso(alpha=0.01,max_iter=10000),RandomForestRegressor(),GradientBoostingRegressor(),SVR(),LinearSVR(),
ElasticNet(alpha=0.001,max_iter=10000),SGDRegressor(max_iter=1000,tol=1e-3),BayesianRidge(),KernelRidge(alpha=0.6, kernel='polynomial', degree=2, coef0=2.5),
ExtraTreesRegressor(),XGBRegressor()]# Here is also a list 
# In[68]:
names = ["LR", "Ridge", "Lasso", "RF", "GBR", "SVR", "LinSVR", "Ela","SGD","Bay","Ker","Extra","Xgb"]# list 
for name, model in zip(names, models):
score = rmse_cv(model, X_scaled, y_log)
print("{}: {:.6f}, {:.4f}".format(name,score.mean(),score.std()))
# In[69]:
# Define crossover mode , Specify the model first and then the parameters , Easy to test multiple models , Grid cross validation 
class grid():
def __init__(self,model):
self.model = model# Import model 
# All models are verified 5 Time 
def grid_get(self,X,y,param_grid):# Grid parameters are generally in the form of dictionaries 
grid_search = GridSearchCV(self.model,param_grid,cv=5, scoring="neg_mean_squared_error")
grid_search.fit(X,y)
print(grid_search.best_params_, np.sqrt(-grid_search.best_score_))
grid_search.cv_results_['mean_test_score'] = np.sqrt(-grid_search.cv_results_['mean_test_score'])
print(pd.DataFrame(grid_search.cv_results_)[['params','mean_test_score','std_test_score']])
# In[70]:
grid(Lasso()).grid_get(X_scaled,y_log,{
'alpha': [0.0004,0.0005,0.0007,0.0006,0.0009,0.0008],'max_iter':[10000]})
# In[71]:
grid(Ridge()).grid_get(X_scaled,y_log,{
'alpha':[35,40,45,50,55,60,65,70,80,90]})
# In[72]:
grid(SVR()).grid_get(X_scaled,y_log,{
'C':[11,12,13,14,15],'kernel':["rbf"],"gamma":[0.0003,0.0004],"epsilon":[0.008,0.009]})# Support vector machine regression 
# In[73]:
param_grid={
'alpha':[0.2,0.3,0.4,0.5], 'kernel':["polynomial"], 'degree':[3],'coef0':[0.8,1,1.2]}# Defined parameters , Use a dictionary to express 
grid(KernelRidge()).grid_get(X_scaled,y_log,param_grid)
# In[74]:
grid(ElasticNet()).grid_get(X_scaled,y_log,{
'alpha':[0.0005,0.0008,0.004,0.005],'l1_ratio':[0.08,0.1,0.3,0.5,0.7],'max_iter':[10000]})
# In[75]:
# Define the weighted average , It is equivalent to writing fit_transform()
class AverageWeight(BaseEstimator, RegressorMixin):
def __init__(self,mod,weight):
self.mod = mod# Number of models 
self.weight = weight# The weight 
def fit(self,X,y):
self.models_ = [clone(x) for x in self.mod]
for model in self.models_:
model.fit(X,y)
return self
def predict(self,X):
w = list()
pred = np.array([model.predict(X) for model in self.models_])
# For each data point , The single model is multiplied by the weight , And then add up 
for data in range(pred.shape[1]):# Take the number of columns 
single = [pred[model,data]*weight for model,weight in zip(range(pred.shape[0]),self.weight)]
w.append(np.sum(single))
return w
# In[76]:
# Specify the parameters of each algorithm 
lasso = Lasso(alpha=0.0005,max_iter=10000)
ridge = Ridge(alpha=60)
svr = SVR(gamma= 0.0004,kernel='rbf',C=13,epsilon=0.009)
ker = KernelRidge(alpha=0.2 ,kernel='polynomial',degree=3 , coef0=0.8)
ela = ElasticNet(alpha=0.005,l1_ratio=0.08,max_iter=10000)
bay = BayesianRidge()
# In[77]:
#6 A weight 
w1 = 0.02
w2 = 0.2
w3 = 0.25
w4 = 0.3
w5 = 0.03
w6 = 0.2
# In[78]:
weight_avg = AverageWeight(mod = [lasso,ridge,svr,ker,ela,bay],weight=[w1,w2,w3,w4,w5,w6])
# In[79]:
rmse_cv(weight_avg,X_scaled,y_log), rmse_cv(weight_avg,X_scaled,y_log).mean()# Calculate the mean value of cross validation 
# ## Stacking of models 
# In[80]:
class stacking(BaseEstimator, RegressorMixin, TransformerMixin):
def __init__(self,mod,meta_model):
self.mod = mod
self.meta_model = meta_model# Metamodel 
self.kf = KFold(n_splits=5, random_state=42, shuffle=True)#5 The division of folds 
# The data set is divided into 5 Share 
def fit(self,X,y):
self.saved_model = [list() for i in self.mod]# Fit with a model 
oof_train = np.zeros((X.shape[0], len(self.mod)))
for i,model in enumerate(self.mod):# The index and the model itself are returned 
for train_index, val_index in self.kf.split(X,y):## The returned data is the province 
renew_model = clone(model)# Copy of the model 
renew_model.fit(X[train_index], y[train_index])# Train data 
self.saved_model[i].append(renew_model)# Add the model 
oof_train[val_index,i] = renew_model.predict(X[val_index])# Used to predict validation sets 
self.meta_model.fit(oof_train,y)# Metamodel 
return self
def predict(self,X):
whole_test = np.column_stack([np.column_stack(model.predict(X) for model in single_model).mean(axis=1)
for single_model in self.saved_model]) # The result is the entire test set 
return self.meta_model.predict(whole_test)# What is returned is to use the metamodel to predict the entire test set 
# forecast , Make the entire test set 
def get_oof(self,X,y,test_X):
oof = np.zeros((X.shape[0],len(self.mod)))# Initialize to 0
test_single = np.zeros((test_X.shape[0],5))# Initialize to 0 
test_mean = np.zeros((test_X.shape[0],len(self.mod)))
for i,model in enumerate(self.mod):#i It's a model 
for j, (train_index,val_index) in enumerate(self.kf.split(X,y)):#j Is all the divided data 
clone_model = clone(model)# Clone module , Copy the model 
clone_model.fit(X[train_index],y[train_index])# Train the segmented data 
oof[val_index,i] = clone_model.predict(X[val_index])# Predict the validation set 
test_single[:,j] = clone_model.predict(test_X)# Predict the test set 
test_mean[:,i] = test_single.mean(axis=1)# The test set calculates the mean value 
return oof, test_mean
# In[81]:
# After pretreatment, it can be put into the stacked model for calculation 
a = SimpleImputer().fit_transform(X_scaled)#x
b = SimpleImputer().fit_transform(y_log.values.reshape(-1,1)).ravel()#y
# a = Imputer().fit_transform(X_scaled)# amount to x
# b = Imputer().fit_transform(y_log.values.reshape(-1,1)).ravel()# amount to y
# In[82]:
stack_model = stacking(mod=[lasso,ridge,svr,ker,ela,bay],meta_model=ker)# The first and second layer models are defined 
# In[83]:
print(rmse_cv(stack_model,a,b))# The evaluation function is used 
print(rmse_cv(stack_model,a,b).mean())
# In[84]:
X_train_stack, X_test_stack = stack_model.get_oof(a,b,test_X_scaled)# Transform the data 
# In[85]:
X_train_stack.shape, a.shape
# In[86]:
X_train_add = np.hstack((a,X_train_stack))
X_test_add = np.hstack((test_X_scaled,X_test_stack))
X_train_add.shape, X_test_add.shape
# In[87]:
print(rmse_cv(stack_model,X_train_add,b))
print(rmse_cv(stack_model,X_train_add,b).mean())
# In[88]:
stack_model = stacking(mod=[lasso,ridge,svr,ker,ela,bay],meta_model=ker)
# In[89]:
stack_model.fit(a,b)# Model training 
# In[90]:
pred = np.exp(stack_model.predict(test_X_scaled))# To make predictions 
# In[91]:
result=pd.DataFrame({
'Id':test.Id, 'SalePrice':pred})
result.to_csv("submission3.csv",index=False)

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved