程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

python3.7 機器學習-Day3

編輯:Python

代碼來源

(https://github.com/MLEveryday/100-Days-Of-ML-Code.git)

說明:文章中的python代碼大部分來自於github(少數是學習時測試添加),所附筆記為學習時注。

Day3 多元線性回歸

基本步驟:

數據預處理–>在訓練集上訓練模型–>預測結果

學習筆記(含測試部分)

# Day3:Multiple_Linear_Regression
# 2019.2.15
# coding=utf-8
import warnings
warnings.filterwarnings("ignore")
# Importing the libraries
# 導入庫
import pandas as pd
import numpy as np
# Importing the dataset
# 導入數據
dataset = pd.read_csv('C:/Users/Ymy/Desktop/100-Days-Of-ML-Code/datasets/50_Startups.csv')
# X為dataset的第0~3列(0:-1 表示從0開始到總列數-1結束),Y為第4列
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 4 ].values
print("導入數據")
print("X")
print(X)
print("Y")
print(Y)
print("----------------")
# 檢查缺失數據,由於數據中無缺失值,此步驟略去
# Encoding Categorical data
# 數據分類 (詳細解釋見 Day1)
# 1.從sklearn.preprocessing中引入LabelEncoder, OneHotEncoder兩個類
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# 2.創建LabelEncoder類的對象labelencoder,並使用fit_transform方法
labelencoder = LabelEncoder()
# 對X中的第3類使用fit_transform
X[: , 3] = labelencoder.fit_transform(X[ : , 3])
print("labelEncoder")
print("X")
print(X)
# 3.創建OneHotEncoder類的對象onehotencoder,並使用fit_transform方法
# 此獨熱編碼將根據樣本的第3列對目標數據進行編碼
onehotencoder = OneHotEncoder(categorical_features = [3])
# 將X第3列進行獨熱編碼,將結果轉為成數組,並賦值給X
X = onehotencoder.fit_transform(X).toarray()
print("OneHotEncoder")
print("X")
print(X)
print("----------------")
# Avoiding Dummy Variable Trap(具體解釋見底部 說明1)
# 避免虛擬變量陷阱
# 刪除了X的第0列,即使用兩個虛擬變量表示標簽編碼中的0、1、2這三個類別
X = X[: , 1:]
print("避免虛擬變量陷阱")
print("X")
print(X)
print("----------------")
# Splitting the dataset into the Training set and Test set
# 劃分數據集(測試集占20%)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
print("X_train")
print(X_train)
print("X_test")
print(X_test)
print("Y_train")
print(Y_train)
print("Y_test")
print(Y_test)
print("----------------")
# Fitting Multiple Linear Regression to the Training set
# 對訓練集進行多元線性回歸,方法同 Day2的簡單線性回歸
# 1.使用sklearn.linear_model的LinearRegression類
from sklearn.linear_model import LinearRegression
# 2.創建LinearRegression類的對象regressor,並使用fit()方法
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
# Predicting the Test set results
# 使用上一步的方法,預測測試集的結果
y_pred = regressor.predict(X_test)
# regression evaluation
# 回歸評估
# 使用sklearn.metrics的r2_score
from sklearn.metrics import r2_score
''' R ^ 2(確定系數)回歸分數函數 def r2_score(y_true, y_pred, sample_weight=None,multioutput="uniform_average"): """R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.""" 最佳分數為1.0,它可能是負數(因為模型可以任意更差) 總是預測y的期望值的常數模型,忽略輸入特征,將得到R ^ 2得分為0.0 Parameters ---------- y_true (事實目標值): array-like of shape = (n_samples) or (n_samples, n_outputs) Ground truth (correct) target values. y_pred (估計目標值): array-like of shape = (n_samples) or (n_samples, n_outputs) Estimated target values. sample_weight : array-like of shape = (n_samples), optional Sample weights. multioutput : string in ['raw_values', 'uniform_average', \ 'variance_weighted'] or None or array-like of shape (n_outputs) Defines aggregating of multiple output scores. Array-like value defines weights used to average scores. Default is "uniform_average".默認是"uniform_average" 'raw_values' : Returns a full set of scores in case of multioutput input. 在多輸出輸入的情況下返回一整套分數 'uniform_average' : Scores of all outputs are averaged with uniform weight. 所有輸出的分數均勻均勻 'variance_weighted' : Scores of all outputs are averaged, weighted by the variances of each individual output. 對所有輸出的分數進行平均,並根據每個輸出的方差進行加權 Returns ------- z : float or ndarray of floats '''
# 計算回歸的R ^ 2得分,並輸出
print("R ^ 2得分為:")
print(r2_score(Y_test,y_pred))

輸出

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
RESTART: C:\Users\Ymy\Desktop\100-Days-Of-ML-Code\study_code\Day3\Day 3.py
導入數據
X
[[165349.2 136897.8 471784.1 'New York']
[162597.7 151377.59 443898.53 'California']
[153441.51 101145.55 407934.54 'Florida']
[144372.41 118671.85 383199.62 'New York']
[142107.34 91391.77 366168.42 'Florida']
[131876.9 99814.71 362861.36 'New York']
[134615.46 147198.87 127716.82 'California']
[130298.13 145530.06 323876.68 'Florida']
[120542.52 148718.95 311613.29 'New York']
[123334.88 108679.17 304981.62 'California']
[101913.08 110594.11 229160.95 'Florida']
[100671.96 91790.61 249744.55 'California']
[93863.75 127320.38 249839.44 'Florida']
[91992.39 135495.07 252664.93 'California']
[119943.24 156547.42 256512.92 'Florida']
[114523.61 122616.84 261776.23 'New York']
[78013.11 121597.55 264346.06 'California']
[94657.16 145077.58 282574.31 'New York']
[91749.16 114175.79 294919.57 'Florida']
[86419.7 153514.11 0.0 'New York']
[76253.86 113867.3 298664.47 'California']
[78389.47 153773.43 299737.29 'New York']
[73994.56 122782.75 303319.26 'Florida']
[67532.53 105751.03 304768.73 'Florida']
[77044.01 99281.34 140574.81 'New York']
[64664.71 139553.16 137962.62 'California']
[75328.87 144135.98 134050.07 'Florida']
[72107.6 127864.55 353183.81 'New York']
[66051.52 182645.56 118148.2 'Florida']
[65605.48 153032.06 107138.38 'New York']
[61994.48 115641.28 91131.24 'Florida']
[61136.38 152701.92 88218.23 'New York']
[63408.86 129219.61 46085.25 'California']
[55493.95 103057.49 214634.81 'Florida']
[46426.07 157693.92 210797.67 'California']
[46014.02 85047.44 205517.64 'New York']
[28663.76 127056.21 201126.82 'Florida']
[44069.95 51283.14 197029.42 'California']
[20229.59 65947.93 185265.1 'New York']
[38558.51 82982.09 174999.3 'California']
[28754.33 118546.05 172795.67 'California']
[27892.92 84710.77 164470.71 'Florida']
[23640.93 96189.63 148001.11 'California']
[15505.73 127382.3 35534.17 'New York']
[22177.74 154806.14 28334.72 'California']
[1000.23 124153.04 1903.93 'New York']
[1315.46 115816.21 297114.46 'Florida']
[0.0 135426.92 0.0 'California']
[542.05 51743.15 0.0 'New York']
[0.0 116983.8 45173.06 'California']]
Y
[192261.83 191792.06 191050.39 182901.99 166187.94 156991.12 156122.51
155752.6 152211.77 149759.96 146121.95 144259.4 141585.52 134307.35
132602.65 129917.04 126992.93 125370.37 124266.9 122776.86 118474.03
111313.02 110352.25 108733.99 108552.04 107404.34 105733.54 105008.31
103282.38 101004.64 99937.59 97483.56 97427.84 96778.92 96712.8
96479.51 90708.19 89949.14 81229.06 81005.76 78239.91 77798.83
71498.49 69758.98 65200.33 64926.08 49490.75 42559.73 35673.41
14681.4 ]
----------------
labelEncoder
X
[[165349.2 136897.8 471784.1 2]
[162597.7 151377.59 443898.53 0]
[153441.51 101145.55 407934.54 1]
[144372.41 118671.85 383199.62 2]
[142107.34 91391.77 366168.42 1]
[131876.9 99814.71 362861.36 2]
[134615.46 147198.87 127716.82 0]
[130298.13 145530.06 323876.68 1]
[120542.52 148718.95 311613.29 2]
[123334.88 108679.17 304981.62 0]
[101913.08 110594.11 229160.95 1]
[100671.96 91790.61 249744.55 0]
[93863.75 127320.38 249839.44 1]
[91992.39 135495.07 252664.93 0]
[119943.24 156547.42 256512.92 1]
[114523.61 122616.84 261776.23 2]
[78013.11 121597.55 264346.06 0]
[94657.16 145077.58 282574.31 2]
[91749.16 114175.79 294919.57 1]
[86419.7 153514.11 0.0 2]
[76253.86 113867.3 298664.47 0]
[78389.47 153773.43 299737.29 2]
[73994.56 122782.75 303319.26 1]
[67532.53 105751.03 304768.73 1]
[77044.01 99281.34 140574.81 2]
[64664.71 139553.16 137962.62 0]
[75328.87 144135.98 134050.07 1]
[72107.6 127864.55 353183.81 2]
[66051.52 182645.56 118148.2 1]
[65605.48 153032.06 107138.38 2]
[61994.48 115641.28 91131.24 1]
[61136.38 152701.92 88218.23 2]
[63408.86 129219.61 46085.25 0]
[55493.95 103057.49 214634.81 1]
[46426.07 157693.92 210797.67 0]
[46014.02 85047.44 205517.64 2]
[28663.76 127056.21 201126.82 1]
[44069.95 51283.14 197029.42 0]
[20229.59 65947.93 185265.1 2]
[38558.51 82982.09 174999.3 0]
[28754.33 118546.05 172795.67 0]
[27892.92 84710.77 164470.71 1]
[23640.93 96189.63 148001.11 0]
[15505.73 127382.3 35534.17 2]
[22177.74 154806.14 28334.72 0]
[1000.23 124153.04 1903.93 2]
[1315.46 115816.21 297114.46 1]
[0.0 135426.92 0.0 0]
[542.05 51743.15 0.0 2]
[0.0 116983.8 45173.06 0]]
Warning (from warnings module):
File "D:\python\lib\site-packages\sklearn\preprocessing\_encoders.py", line 390
"use the ColumnTransformer instead.", DeprecationWarning)
DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
OneHotEncoder
X
[[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05
4.7178410e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05
4.4389853e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05
4.0793454e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05
3.8319962e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04
3.6616842e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04
3.6286136e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05
1.2771682e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05
3.2387668e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05
3.1161329e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05
3.0498162e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05
2.2916095e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04
2.4974455e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05
2.4983944e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05
2.5266493e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05
2.5651292e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05
2.6177623e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05
2.6434606e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05
2.8257431e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05
2.9491957e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05
0.0000000e+00]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05
2.9866447e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05
2.9973729e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05
3.0331926e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05
3.0476873e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04
1.4057481e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05
1.3796262e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05
1.3405007e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05
3.5318381e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05
1.1814820e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05
1.0713838e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05
9.1131240e+04]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05
8.8218230e+04]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05
4.6085250e+04]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05
2.1463481e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05
2.1079767e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04
2.0551764e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05
2.0112682e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04
1.9702942e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04
1.8526510e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04
1.7499930e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05
1.7279567e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04
1.6447071e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04
1.4800111e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05
3.5534170e+04]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05
2.8334720e+04]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05
1.9039300e+03]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05
2.9711446e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05
0.0000000e+00]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04
0.0000000e+00]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05
4.5173060e+04]]
----------------
避免虛擬變量陷阱
X
[[0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
[0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
[1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
[0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
[1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]
[0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
[0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
[1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
[0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
[0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
[1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
[0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
[1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
[0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
[1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
[0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
[0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
[0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
[1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
[0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
[0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
[1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
[1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
[0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
[0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
[1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
[0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
[1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
[0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
[1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
[0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
[0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
[1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
[0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
[0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
[1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
[0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
[0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
[0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
[0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
[1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
[0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
[0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
[0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]
[0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
[1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]]
----------------
X_train
[[1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
[0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
[1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
[0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
[1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
[1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
[1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
[0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
[0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
[0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
[1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
[0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
[0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
[0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
[0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
[0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
[0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
[0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
[0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
[0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
[0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
[0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]
[0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
[1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
[0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
[0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
[0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
[1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
[1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
[0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
[0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
[0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
[1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
[0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
[0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]]
X_test
[[1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
[0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
[1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
[1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
[1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
[0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
[0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
[0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
[1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
[1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]]
Y_train
[ 96778.92 96479.51 105733.54 96712.8 124266.9 155752.6 132602.65
64926.08 35673.41 101004.64 129917.04 99937.59 97427.84 126992.93
71498.49 118474.03 69758.98 152211.77 134307.35 107404.34 156991.12
125370.37 78239.91 14681.4 191792.06 141585.52 89949.14 108552.04
156122.51 108733.99 90708.19 111313.02 122776.86 149759.96 81005.76
49490.75 182901.99 192261.83 42559.73 65200.33]
Y_test
[103282.38 144259.4 146121.95 77798.83 191050.39 105008.31 81229.06
97483.56 110352.25 166187.94]
----------------
R ^ 2得分為:
0.9347068473282989

說明及參考

說明1:

虛擬變量(Dummy Variable)和虛擬變量陷阱(Dummy Variable Regression)

虛擬變量又稱虛設變量、名義變量或啞變量,用以反映質的屬性的一個人工變量,是量化了的質變量,通常取值為0或1。
引入啞變量可使線形回歸模型變得更復雜,但對問題描述更簡明,一個方程能達到兩個方程的作用,而且接近現實。

例如,反映文程度的虛擬變量可取為:1:本科學歷;0:非本科學歷

一般地,在虛擬變量的設置中:基礎類型、肯定類型取值為1;比較類型,否定類型取值為0。

虛擬變量陷阱是指一般在引入虛擬變量時要求如果有m個定性變量,在模型中引入m-1個虛擬變量。否則,如果引入m個虛擬變量,就會導致模型解釋變量間出現完全共線性的情況。

我們一般稱由於引入虛擬變量個數與定性因素個數相同出現的模型無法估計的問題,稱為"虛擬變量陷阱"。

由上述定義:在上述測試中對X的第3列數據項-State先後進行了標簽編碼和獨熱編碼,其中,標簽編碼將三個變量New York、California、Florida分別編碼為2、0、1。緊接著再對X的這列數據進行獨熱編碼,將2、0、1編碼為001、100、010。而此時的3個變量New York、California、Florida,就引入了3個虛擬變量,屬於虛擬變量陷阱。因此在Avoiding Dummy Variable Trap中,刪除了獨熱編碼的第一列,即使用01、00、10分別表示New York、California、Florida,避免了虛擬變量陷阱。

說明2:由於是剛開始進行學習,因此測試和對應的輸出比較詳細(已經把行數較多的數據展開)。

資料參考:
sklearn-metrics-r2-score
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn-metrics-r2-score


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved