博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
COMP7404 Machine Learing——Hyperparameter Grid Search & Nested Cross-Validation
阅读量:2136 次
发布时间:2019-04-30

本文共 4437 字,大约阅读时间需要 14 分钟。

Hyperparameter Grid Search

GridSearchCV的名字其实可以拆分为两部分,GridSearch和CV,即网格搜索和交叉验证。

 

常用参数

estimator:所使用的分类器,如estimator=RandomForestClassifier(min_samples_split=100,min_samples_leaf=20,max_depth=8,max_features='sqrt',random_state=10), 并且传入除需要确定最佳的参数之外的其他参数。每一个分类器都需要一个scoring参数,或者score方法。

param_grid:值为字典或者列表,即需要最优化的参数的取值,param_grid =param_test1,param_test1 = {'n_estimators':range(10,71,10)}。

scoring :准确度评价标准,默认None,这时需要使用score函数;或者如scoring='roc_auc',根据所选模型不同,评价准则不同。字符串(函数名),或是可调用对象,需要其函数签名形如:scorer(estimator, X, y);如果是None,则使用estimator的误差估计函数。

 

就是param_grid中的每一个{}都会去试

import pandas as pdfrom sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.model_selection import GridSearchCVfrom sklearn.svm import SVCfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import StandardScalerdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)X = df.loc[:, 2:].valuesy = df.loc[:, 1].valuesle = LabelEncoder()y = le.fit_transform(y)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=1)pipe_svc = make_pipeline(StandardScaler(), SVC(random_state=1))param_range = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]param_grid = [              {'svc__C': param_range, 'svc__kernel': ['linear']},               {'svc__C': param_range, 'svc__gamma': param_range, 'svc__kernel': ['rbf']}             ]gs = GridSearchCV(estimator=pipe_svc, param_grid=param_grid, scoring='accuracy', cv=10, n_jobs=1)#gs是
gs.fit(X_train, y_train)print(gs.best_score_)print(gs.best_params_)clf = gs.best_estimator_clf.fit(X_train, y_train)print('Test accuracy: %.3f' % clf.score(X_test, y_test))

 

Nested Cross-Validation

import pandas as pdimport numpy as npfrom sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.model_selection import GridSearchCVfrom sklearn.svm import SVCfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import cross_val_scoredf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)X = df.loc[:, 2:].valuesy = df.loc[:, 1].valuesle = LabelEncoder()y = le.fit_transform(y)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=1)pipe_svc = make_pipeline(StandardScaler(), SVC(random_state=1))param_range = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]param_grid = [{'svc__C': param_range, 'svc__kernel': ['linear']},               {'svc__C': param_range, 'svc__gamma': param_range, 'svc__kernel': ['rbf']}]gs = GridSearchCV(estimator=pipe_svc, param_grid=param_grid, scoring='accuracy', cv=2)scores = cross_val_score(gs, X_train, y_train, scoring='accuracy', cv=5)print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))

 

 

We could use the nested cross-validation approach to compare an SVM model to a simple decision tree classifier; for simplicity, we will only tune its depth parameter

import pandas as pdimport numpy as npfrom sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.model_selection import GridSearchCVfrom sklearn.svm import SVCfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import cross_val_scorefrom sklearn.tree import DecisionTreeClassifierdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)X = df.loc[:, 2:].valuesy = df.loc[:, 1].valuesle = LabelEncoder()y = le.fit_transform(y)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=1)pipe_svc = make_pipeline(StandardScaler(), SVC(random_state=1))param_range = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]param_grid = [{'svc__C': param_range, 'svc__kernel': ['linear']},               {'svc__C': param_range, 'svc__gamma': param_range, 'svc__kernel': ['rbf']}]gs = GridSearchCV(estimator=DecisionTreeClassifier(random_state=0),                   param_grid=[{'max_depth': [1, 2, 3, 4, 5, 6, 7, None]}], scoring='accuracy', cv=2)scores = cross_val_score(gs, X_train, y_train, scoring='accuracy', cv=5)print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))

 

 

 

 

转载地址:http://hmygf.baihongyu.com/

你可能感兴趣的文章
用 TensorFlow 让你的机器人唱首原创给你听
查看>>
对比学习用 Keras 搭建 CNN RNN 等常用神经网络
查看>>
深度学习的主要应用举例
查看>>
word2vec 模型思想和代码实现
查看>>
怎样做情感分析
查看>>
用深度神经网络处理NER命名实体识别问题
查看>>
用 RNN 训练语言模型生成文本
查看>>
RNN与机器翻译
查看>>
用 Recursive Neural Networks 得到分析树
查看>>
RNN的高级应用
查看>>
TensorFlow-7-TensorBoard Embedding可视化
查看>>
一个隐马尔科夫模型的应用实例:中文分词
查看>>
轻松看懂机器学习十大常用算法
查看>>
一个框架解决几乎所有机器学习问题
查看>>
特征工程怎么做
查看>>
机器学习算法应用中常用技巧-1
查看>>
机器学习算法应用中常用技巧-2
查看>>
通过一个kaggle实例学习解决机器学习问题
查看>>
决策树的python实现
查看>>
Sklearn 快速入门
查看>>