程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

The relation and difference between stratifiedkfold and kfold (50% discount verification) cross verification Python example

編輯:Python

Kfold:

Divide the entire training set into k A disjoint subset , Suppose the number of training samples in the training set is m, So every subset has m/k A training sample , such as [1,2,3,4,5,6] Split it in two , Then the first copy may be [1,3,5], A second [2,4,6].
Each time from the divided subset , Take out one as a test set , other k-1 As a training set
stay k-1 A learner model is trained on training sets , Validate this model with a test set , Finally, the average value of the classification rate of all subsets is obtained , The true classification rate as a function of the model or hypothesis .

StratifiedKFold

StratifiedKFold Usage is similar. Kfold, But it is stratified sampling , Ensure the prediction results of the training set (0,1) All possess , The proportion of samples in the test set is the same as that in the original data set . That is, both positive and negative examples contain .

Example :

import numpy as np
from sklearn.model_selection import StratifiedKFold,KFold
X=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
])
y=np.array([1,1,0,0,1,1,0,0])
#n_folds This parameter has no , Different packages introduced ,
floder = KFold(n_splits=4,random_state=0,shuffle=False)
sfolder = StratifiedKFold(n_splits=4,random_state=0,shuffle=False)
for train, test in sfolder.split(X,y):
print('StratifiedKFold Train index: %s | test: %s' % (train, test))
print('X[train]:',X[train])
print('y[train]:',y[train])
print('X[test]:',X[test])
print('y[test]:',y[test])
print(" ")
for train, test in floder.split(X,y):
print('KFold Train index: %s | test index : %s' % (train, test))
print('X[train]:', X[train])
print('y[train]:', y[train])
print('X[test]:', X[test])
print('y[test]:', y[test])
print(" ")

result :

D:\ProgramFiles\Anaconda3\python.exe "D:/Python Project/Finance-Cup-Data-master/Data-Finance-Cup/luojiLearn/KfoldLearn.py"
StratifiedKFold Train: [1 3 4 5 6 7] | test: [0 2]
X[train]: [[11 12 13 14]
[31 32 33 34]
[41 42 43 44]
[51 52 53 54]
[61 62 63 64]
[71 72 73 74]]
y[train]: [1 0 1 1 0 0]
X[test]: [[ 1 2 3 4]
[21 22 23 24]]
y[test]: [1 0]
StratifiedKFold Train: [0 2 4 5 6 7] | test: [1 3]
X[train]: [[ 1 2 3 4]
[21 22 23 24]
[41 42 43 44]
[51 52 53 54]
[61 62 63 64]
[71 72 73 74]]
y[train]: [1 0 1 1 0 0]
X[test]: [[11 12 13 14]
[31 32 33 34]]
y[test]: [1 0]
StratifiedKFold Train: [0 1 2 3 5 7] | test: [4 6]
X[train]: [[ 1 2 3 4]
[11 12 13 14]
[21 22 23 24]
[31 32 33 34]
[51 52 53 54]
[71 72 73 74]]
y[train]: [1 1 0 0 1 0]
X[test]: [[41 42 43 44]
[61 62 63 64]]
y[test]: [1 0]
StratifiedKFold Train: [0 1 2 3 4 6] | test: [5 7]
X[train]: [[ 1 2 3 4]
[11 12 13 14]
[21 22 23 24]
[31 32 33 34]
[41 42 43 44]
[61 62 63 64]]
y[train]: [1 1 0 0 1 0]
X[test]: [[51 52 53 54]
[71 72 73 74]]
y[test]: [1 0]
KFold Train: [2 3 4 5 6 7] | test: [0 1]
X[train]: [[21 22 23 24]
[31 32 33 34]
[41 42 43 44]
[51 52 53 54]
[61 62 63 64]
[71 72 73 74]]
y[train]: [0 0 1 1 0 0]
X[test]: [[ 1 2 3 4]
[11 12 13 14]]
y[test]: [1 1]
KFold Train: [0 1 4 5 6 7] | test: [2 3]
X[train]: [[ 1 2 3 4]
[11 12 13 14]
[41 42 43 44]
[51 52 53 54]
[61 62 63 64]
[71 72 73 74]]
y[train]: [1 1 1 1 0 0]
X[test]: [[21 22 23 24]
[31 32 33 34]]
y[test]: [0 0]
KFold Train: [0 1 2 3 6 7] | test: [4 5]
X[train]: [[ 1 2 3 4]
[11 12 13 14]
[21 22 23 24]
[31 32 33 34]
[61 62 63 64]
[71 72 73 74]]
y[train]: [1 1 0 0 0 0]
X[test]: [[41 42 43 44]
[51 52 53 54]]
y[test]: [1 1]
KFold Train: [0 1 2 3 4 5] | test: [6 7]
X[train]: [[ 1 2 3 4]
[11 12 13 14]
[21 22 23 24]
[31 32 33 34]
[41 42 43 44]
[51 52 53 54]]
y[train]: [1 1 0 0 1 1]
X[test]: [[61 62 63 64]
[71 72 73 74]]
y[test]: [0 0]

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved