sklearn中的k折交叉验证

K折交叉验证： sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)

思路：将训练/测试数据划分n_splits个互斥子集，每次用其中一个子集当作验证集，剩下的n_splits-1个作为训练集，进行n_splits次训练和测试，得到n_splits个结果

注意：对于不能均等分的数据集，前n_samples%n_splits子集拥有n_samples//n_splits+1个样本，其余子集只有n_samples//n_splits个样本

参数：

n_splits:表示划分几等份

shuffle:在每次划分时，是否进行洗牌

1)若为False，其效果等同于random_state等于整数，每次划分的结果相同

2）若为True时，每次划分的结果都不一样，表示经过洗牌，随机取样的

random_state：随机种子数

属性：

①get_n_splits(X=None, y=None, groups=None)：获取参数n_splits的值

②split(X, y=None, groups=None)：将数据集划分成训练集和测试集，返回索引生成器

A:设置shuffle=False，运行两次，发现两次结果相同

from sklearn.model_selection import KFold
import numpy as np#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=False)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=False)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))'''
train_index [ 3  4  5  6  7  8  9 10 11], test_index [0 1 2]
train_index [ 0  1  2  6  7  8  9 10 11], test_index [3 4 5]
train_index [ 0  1  2  3  4  5  8  9 10 11], test_index [6 7]
train_index [ 0  1  2  3  4  5  6  7 10 11], test_index [8 9]
train_index [0 1 2 3 4 5 6 7 8 9], test_index [10 11]
train_index [ 3  4  5  6  7  8  9 10 11], test_index [0 1 2]
train_index [ 0  1  2  6  7  8  9 10 11], test_index [3 4 5]
train_index [ 0  1  2  3  4  5  8  9 10 11], test_index [6 7]
train_index [ 0  1  2  3  4  5  6  7 10 11], test_index [8 9]
train_index [0 1 2 3 4 5 6 7 8 9], test_index [10 11]
'''

B. 设置shuffle=True时，运行两次，发现两次运行的结果不同

from sklearn.model_selection import KFold
import numpy as np#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))
print('-----------------------------------------------------------')
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))
'''
train_index [ 0  2  3  4  5  6  8 10 11], test_index [1 7 9]
train_index [ 1  2  3  4  5  7  9 10 11], test_index [0 6 8]
train_index [ 0  1  2  3  5  6  7  8  9 10], test_index [ 4 11]
train_index [ 0  1  2  3  4  6  7  8  9 11], test_index [ 5 10]
train_index [ 0  1  4  5  6  7  8  9 10 11], test_index [2 3]
-----------------------------------------------------------
train_index [ 0  1  2  3  4  6  7  8 11], test_index [ 5  9 10]
train_index [ 0  3  4  5  6  8  9 10 11], test_index [1 2 7]
train_index [ 0  1  2  4  5  6  7  8  9 10], test_index [ 3 11]
train_index [ 0  1  2  3  5  7  8  9 10 11], test_index [4 6]
train_index [ 1  2  3  4  5  6  7  9 10 11], test_index [0 8]
'''

C: 设置shuffle=True和random_state=整数，发现每次运行的结果都相同

from sklearn.model_selection import KFold
import numpy as np#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True,random_state=10)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))
print('-----------------------------------------------------------')
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True,random_state=10)
for train_index,test_index in kf.split(X):print('train_index %s, test_index %s'%(train_index,test_index))
'''
0  1  3  4  5  8  9 10 11], test_index [2 6 7]
train_index [ 0  1  2  3  4  6  7  9 10], test_index [ 5  8 11]
train_index [ 0  1  2  4  5  6  7  8  9 11], test_index [ 3 10]
train_index [ 2  3  4  5  6  7  8  9 10 11], test_index [0 1]
train_index [ 0  1  2  3  5  6  7  8 10 11], test_index [4 9]
-----------------------------------------------------------
train_index [ 0  1  3  4  5  8  9 10 11], test_index [2 6 7]
train_index [ 0  1  2  3  4  6  7  9 10], test_index [ 5  8 11]
train_index [ 0  1  2  4  5  6  7  8  9 11], test_index [ 3 10]
train_index [ 2  3  4  5  6  7  8  9 10 11], test_index [0 1]
train_index [ 0  1  2  3  5  6  7  8 10 11], test_index [4 9]
'''

sklearn中的k折交叉验证相关推荐

R中的 K折交叉验证
为了评估模型在数据集上的性能,我们需要衡量模型所做的预测与观察到的数据的匹配程度. 一种常用的方法称为k 折交叉验证,它使用以下方法: 1.将数据集随机分成大小大致相等的k 组或"折叠&q ...
python k折交叉验证,python中sklearnk折交叉验证
python中sklearnk折交叉验证发布时间:2018-06-10 11:09, 浏览次数:492 , 标签: python sklearnk 1.模型验证回顾进行模型验证的一个重要目的是要选 ...
【Python-ML】SKlearn库Pipeline工作流和K折交叉验证
# -*- coding: utf-8 -*- ''' Created on 2018年1月18日 @author: Jason.F @summary: Pipeline,流水线工作流,串联模型拟合. ...
ML：模型训练/模型评估中常用的两种方法代码实现(留一法一次性切分训练和K折交叉验证训练)
ML:模型训练/模型评估中常用的两种方法代码实现(留一法一次性切分训练和K折交叉验证训练) 目录模型训练评估中常用的两种方法代码实现 T1.留一法一次性切分训练 T2.K折交叉验证训模型训练评估中 ...
K折交叉验证（StratifiedKFold与KFold比较）
文章目录一.交叉验证二.K折交叉验证 KFold()方法 StratifiedKFold()方法一.交叉验证交叉验证的基本思想是把在某种意义下将原始数据(dataset)进行分组,一部分做为训 ...
k折交叉验证法python实现_Jason Brownlee专栏| 如何解决不平衡分类的k折交叉验证-不平衡分类系列教程(十)...
作者:Jason Brownlee 编译:Florence Wong – AICUG 本文系AICUG翻译原创,如需转载请联系(微信号:834436689)以获得授权在对不可见示例进行预测时,模型评 ...
机器学习--K折交叉验证（K-fold cross validation）
K 折交叉验证(K-flod cross validation) 当样本数据不充足时,为了选择更好的模型,可以采用交叉验证方法. 基本思想:把给定的数据进行划分,将划分得到的数据集组合为训练集与测试集 ...
k折交叉验证matlab 流程_第51集 python机器学习：分层K折交叉验证及其他方式
由于出现类似鸢尾花数据集这种分段数据可能简单的交叉验证无法适用,所以这里引用了分层K折交叉验证.在分层交叉验证中,我们划分数据,使得每个折中类别之间的比例整数与数据集中的比例相同,如下图所示: mgl ...
五折交叉验证/K折交叉验证， python代码到底怎么写
五折交叉验证: 把数据平均分成5等份,每次实验拿一份做测试,其余用做训练.实验5次求平均值.如上图,第一次实验拿第一份做测试集,其余作为训练集.第二次实验拿第二份做测试集,其余做训练集.依此类推~ 但 ...

sklearn中的k折交叉验证

K折交叉验证： sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)

sklearn中的k折交叉验证相关推荐

最新文章

热门文章