BTB: 用于开发AutoML自动化机器学习系统的简单、可扩展库

会员服务 ·

BTB: 用于开发AutoML自动化机器学习系统的简单、可扩展库

2020 年 1 月 11 日 专知

【导读】BTB是MIT的开源库，其全称是Bayesian Tuning and Bandits，是一个用于auto-tuning系统的一个简单、可扩展的后端系统。

Github链接：https://github.com/HDI-Project/BTB

文档主页： https://hdi-project.github.io/BTB/

系统要求

BTB在Python3.5,3.6,3.7上都可以使用。

安装

可以通过pip和source两种方法安装:

pip:

pip install baytune

source:

git clone git@github.com:HDI-Project/BTB.gitcd BTBgit checkout stablemake install

使用方法

Tuners

Tuners用于对于特定的机器学习算法，快速选择最佳超参数。

该类在btb.tuning.tuners 定义，一轮迭代原理如下：

tuner提出一组超参数
将该超参数应用于模型，并打分
将分数返回给tuner

在每一轮迭代，tuner 会使用已有信息，提出最有可能得到高分的超参数。

为了实例化一个 Tuner，我们需要一个 Tunable 类，和一组超参数

>>> from btb.tuning import Tunable>>> from btb.tuning.tuners import GPTuner>>> from btb.tuning.hyperparams import IntHyperParam>>> hyperparams = {...     'n_estimators': IntHyperParam(min=10, max=500),...     'max_depth': IntHyperParam(min=10, max=500),... }>>> tunable = Tunable(hyperparams)>>> tuner = GPTuner(tunable)

然后在一个循环内实现下面三步：

选择一组超参数

>>> parameters = tuner.propose()>>> parameters{'n_estimators': 297, 'max_depth': 3}

将该超参数应用于模型，并打分

>>> model = RandomForestClassifier(**parameters)
>>> model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=3, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=297, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
>>> score = model.score(X_test, y_test)
>>> score
0.77

将分数返回给tuner

tuner.record(parameters, score)

Seletors

Seletor是tuners的组合，为了使用selector，我们为每个模型创建一个Tuner 和 Selector的实例，

>>> from sklearn.ensemble import RandomForestClassifier>>> from sklearn.svm import SVC>>> from btb.selection import UCB1>>> from btb.tuning.hyperparams import FloatHyperParam>>> models = {...     'RF': RandomForestClassifier,...     'SVC': SVC... }>>> selector = UCB1(['RF', 'SVC'])>>> rf_hyperparams = {...     'n_estimators': IntHyperParam(min=10, max=500),...     'max_depth': IntHyperParam(min=3, max=20)... }>>> rf_tunable = Tunable(rf_hyperparams)>>> svc_hyperparams = {...     'C': FloatHyperParam(min=0.01, max=10.0),...     'gamma': FloatHyperParam(0.000000001, 0.0000001)... }>>> svc_tunable = Tunable(svc_hyperparams)>>> tuners = {...     'RF': GPTuner(rf_tunable),...     'SVC': GPTuner(svc_tunable)... }

然后我们在一个循环里迭代下列步骤：

将所有得分都传到selector，让它决定测试哪个模型

>>> next_choice = selector.select({...     'RF': tuners['RF'].scores,...     'SVC': tuners['SVC'].scores... })>>> next_choice'RF'

从指定tuner中获取一组新参数，创建新的模型实例

>>> parameters = tuners[next_choice].propose()>>> parameters{'n_estimators': 289, 'max_depth': 18}>>> model = models[next_choice](**parameters)

评估模型，将得分传给tuner

>>> model.fit(X_train, y_train)RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',            max_depth=18, max_features='auto', max_leaf_nodes=None,            min_impurity_decrease=0.0, min_impurity_split=None,            min_samples_leaf=1, min_samples_split=2,            min_weight_fraction_leaf=0.0, n_estimators=289, n_jobs=1,            oob_score=False, random_state=None, verbose=0,            warm_start=False)>>> score = model.score(X_test, y_test)>>> score0.89>>> tuners[next_choice].record(parameters, score)