python 利用TPOT进行自动机器学习建模

绪论

首先还是放官方文档吧，我一直觉得真学什么的话还是要翻文档的，包括有些博客解决不了你问题的点，也可以看文档解决。

1	pip insatll tpot

我其实不太清楚为什么这个包不火。有点遗传算法的意思，他集成了特征选择和模型自动选择，而且代码也不复杂，能完成建模、拟合、预测的任务，甚至他能帮你生成完整的一套代码。

不过这样一说我好像确实不了解这方法的优缺点，我只是觉得他的效果很惊艳，然后遍历的时间有点长，感觉数据量不能太大。

导入包和数据

1
2
3
4
5
6
7
8
9

import numpy as np
import pandas as pd
import warnings
from tpot import TPOTClassifier
warnings.filterwarnings('ignore')
#这里数据格式就是除了最后一列是Y，其他都是X
#所以我觉得不用上传数据了，用自己的数据，然后改一下下面的分割数据集的地方就好。
df_train = pd.read_csv("train.csv")
df_test = pd.read_csv("test.csv")

分割训练集和测试集

1
2
3
4

y_train = df_train.iloc[:, 88:89]
y_test = df_test.iloc[:, 88:89]
x_train = df_train.iloc[:, 1:88]
x_test = df_test.iloc[:, 1:88]

TPOT就绪

来了来了tpot来了
generations=5，这是迭代次数，按理说，大一点，效果好，时间长
population_size=20，每一次的子代数，按理说，同上
verbosity=2，等于0是不打印，最多是到3

1
2
3
4
5
6
7
8
9
10
11

#------------------------------------------------------------------------------------------------#

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(x_train,y_train)
print(tpot.score(x_test,y_test))
# 此处，对应于 "一旦TPOT完成搜索，TPOT同时还提供Python代码"
tpot.export('tpot_pipeline2.py') #当前目录生成一个文件

##
#运行tpot_pipeline2.py，用他那个模型预测，下面举个例子
##

在这里插入图片描述

如果我没记错，下面这个就是他自动生成的文件

全自动建模

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

#------------------------------------------------------------------------------------------------#

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.feature_selection import SelectFwe, f_classif
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from tpot.builtins import StackingEstimator
from xgboost import XGBClassifier
from sklearn.preprocessing import FunctionTransformer
from copy import copy

exported_pipeline = make_pipeline(
make_union(
SelectFwe(score_func=f_classif, alpha=0.008),
FunctionTransformer(copy)
),
PCA(iterated_power=6, svd_solver="randomized"),
StackingEstimator(estimator=GradientBoostingClassifier(learning_rate=1.0, max_depth=7, max_features=0.15000000000000002, min_samples_leaf=10, min_samples_split=13, n_estimators=100, subsample=0.2)),
XGBClassifier(learning_rate=0.01, max_depth=8, min_child_weight=2, n_estimators=100, nthread=1, subsample=0.45)
) ##看看看看，我第一次看见这模型的时候人都傻了。

exported_pipeline.fit(x_train, y_train)
pred = exported_pipeline.predict(x_test)

模型评价

1
2
3
4
5
6
7
8
9
10
11
12

#------------------------------------------------------------------------------------------------#

from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve,auc
print('F1-score:%.4f' % metrics.f1_score(y_test,pred))
print('AUC:%.4f' % metrics.roc_auc_score(y_test,pred))
print('ACC:%.4f' % metrics.accuracy_score(y_test,pred))
print('Recall:%.4f' % metrics.recall_score(y_test,pred))
print('Precesion:%.4f' % metrics.precision_score(y_test,pred))

绘制roc曲线

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

#------------------------------------------------------------------------------------------------#

fpr,tpr,thresholds = roc_curve(y_test, exported_pipeline.predict_proba(x_test)[:, 1])
roc_auc = auc(fpr, tpr)
plt.figure(figsize = (8, 5))
plt.plot(fpr, tpr, color = 'darkorange',
lw = 2, label = 'ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color = 'navy', lw = 2, linestyle = '--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC')
plt.legend(loc=2)
plt.savefig('ga1.jpg')
plt.show()

在这里插入图片描述
别问为什么这么低，问就是股票数据。

1	好好学习奥，peace