【导读】BTB是MIT的开源库,其全称是Bayesian Tuning and Bandits,是一个用于auto-tuning系统的一个简单、可扩展的后端系统。
Github链接:https://github.com/HDI-Project/BTB
文档主页: https://hdi-project.github.io/BTB/
系统要求
BTB在Python3.5,3.6,3.7上都可以使用。
安装
可以通过pip和source两种方法安装:
pip:
pip install baytune
source:
git clone git@github.com:HDI-Project/BTB.gitcd BTBgit checkout stablemake install
使用方法
Tuners
Tuners用于对于特定的机器学习算法,快速选择最佳超参数。
该类在btb.tuning.tuners 定义,一轮迭代原理如下:
tuner提出一组超参数
将该超参数应用于模型,并打分
将分数返回给tuner
在每一轮迭代,tuner 会使用已有信息,提出最有可能得到高分的超参数。
为了实例化一个 Tuner,我们需要一个 Tunable 类,和一组超参数
from btb.tuning import Tunablefrom btb.tuning.tuners import GPTunerfrom btb.tuning.hyperparams import IntHyperParamhyperparams = {'n_estimators': IntHyperParam(min=10, max=500),'max_depth': IntHyperParam(min=10, max=500),}tunable = Tunable(hyperparams)tuner = GPTuner(tunable)
然后在一个循环内实现下面三步:
选择一组超参数
> parameters = tuner.propose()> parameters{'n_estimators': 297, 'max_depth': 3}
将该超参数应用于模型,并打分
>>> model = RandomForestClassifier(**parameters)
>>> model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=3, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=297, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
>>> score = model.score(X_test, y_test)
>>> score
0.77
将分数返回给tuner
tuner.record(parameters, score)
Seletors
Seletor是tuners的组合,为了使用selector,我们为每个模型创建一个Tuner 和 Selector的实例,
from sklearn.ensemble import RandomForestClassifierfrom sklearn.svm import SVCfrom btb.selection import UCB1from btb.tuning.hyperparams import FloatHyperParammodels = {'RF': RandomForestClassifier,'SVC': SVC}selector = UCB1(['RF', 'SVC'])rf_hyperparams = {'n_estimators': IntHyperParam(min=10, max=500),'max_depth': IntHyperParam(min=3, max=20)}rf_tunable = Tunable(rf_hyperparams)svc_hyperparams = {'C': FloatHyperParam(min=0.01, max=10.0),'gamma': FloatHyperParam(0.000000001, 0.0000001)}svc_tunable = Tunable(svc_hyperparams)tuners = {'RF': GPTuner(rf_tunable),'SVC': GPTuner(svc_tunable)}
然后我们在一个循环里迭代下列步骤:
将所有得分都传到selector,让它决定测试哪个模型
next_choice = selector.select({'RF': tuners['RF'].scores,'SVC': tuners['SVC'].scores})next_choice'RF'
从指定tuner中获取一组新参数,创建新的模型实例
>>> parameters = tuners[next_choice].propose()>>> parameters{'n_estimators': 289, 'max_depth': 18}>>> model = models[next_choice](**parameters)
评估模型,将得分传给tuner
>>> model.fit(X_train, y_train)RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',max_depth=18, max_features='auto', max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, n_estimators=289, n_jobs=1,oob_score=False, random_state=None, verbose=0,warm_start=False)>>> score = model.score(X_test, y_test)>>> score0.89>>> tuners[next_choice].record(parameters, score)