关于python 2.7:将Keras的指标与sklearn.classification_report的指标进行比较

Comparing metrics of Keras with metrics of sklearn.classification_report

在评估神经网络时,我正在努力采用不同的指标。
我的调查显示,与sklearn.classification报告相比,Keras(版本1.2.2)针对特定指标(使用函数评估)计算了不同的值。

具体来说,度量值\\'precision \\'(即Keras的\\'precision \\'= sklearn的\\'precision \\')或\\'recall \\'(即Keras的\\'recall \\')的值! = \\'recall \\'sklearn)不同。
对于下面的工作示例,差异似乎是随机的,但评估较大的网络表明Keras的'precision'等于(几乎)sklearn的'recall \\',而这两个'recall \\'指标显然不同。

感谢您的帮助!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils # numpy utils for to_categorical()
from keras import backend as K  # abstract backend API (in order to generate compatible code for Theano and Tf)
from sklearn.metrics import classification_report

batch_size = 128
nb_classes = 10
nb_epoch = 30

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # range [0,1]
X_test /= 255 # range [0,1]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes) # necessary for use of categorical_crossentropy
Y_test = np_utils.to_categorical(y_test, nb_classes) # necessary for use of categorical_crossentropy

# create model
model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

# configure model
model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy', 'precision', 'recall'])

# train model
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))

# evaluate model with keras
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
print('Test precision:', score[2])
print('Test recall:', score[3])

# evaluate model with sklearn
predictions_last_epoch = model.predict(X_test, batch_size=batch_size, verbose=1)
target_names = ['class 0', 'class 1', 'class 2', 'class 3', 'class 4',
                    'class 5', 'class 6', 'class 7', 'class 8', 'class 9']

predicted_classes = np.argmax(predictions_last_epoch, axis=1)
print('\
')
print(classification_report(y_test, predicted_classes,
        target_names=target_names, digits = 6))

E D I T

上面给出的脚本的输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Test score: 0.0271549037314
Test accuracy: 0.9916
Test precision: 0.992290322304
Test recall: 0.9908


9728/10000 [============================>.] - ETA: 0s

         precision    recall  f1-score   support

class 0   0.987867  0.996939  0.992382       980
class 1   0.993860  0.998238  0.996044      1135
class 2   0.990329  0.992248  0.991288      1032
class 3   0.991115  0.994059  0.992585      1010
class 4   0.994882  0.989817  0.992343       982
class 5   0.991041  0.992152  0.991597       892
class 6   0.993678  0.984342  0.988988       958
class 7   0.992180  0.987354  0.989761      1028
class 8   0.989754  0.991786  0.990769       974
class 9   0.991054  0.988107  0.989578      1009

avg / total   0.991607  0.991600  0.991597     10000

对于其他型号:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
val/test loss: 0.231304548573
val/test categorical_accuracy: **0.978500002956**
val/test precision: *0.995103668976*
val/test recall: 0.941900001907
val/test fbeta_score: 0.967675107574
val/test mean_squared_error: 0.0064611148566
10000/10000 [==============================] - 0s    


         precision    recall  f1-score   support

class 0   0.989605  0.971429  0.980433       980
class 1   0.985153  0.993833  0.989474      1135
class 2   0.988154  0.969961  0.978973      1032
class 3   0.981373  0.991089  0.986207      1010
class 4   0.968907  0.983707  0.976251       982
class 5   0.997633  0.945067  0.970639       892
class 6   0.995690  0.964509  0.979852       958
class 7   0.987230  0.977626  0.982405      1028
class 8   0.945205  0.991786  0.967936       974
class 9   0.951429  0.990089  0.970374      1009

avg / total   *0.978964*  **0.978500**  0.978522     10000

所需度量的定义(对于model.compile):

1
2
3
4
5
metrics=['categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']

model.compile(loss='categorical_crossentropy',
            optimizer='sgd',
            metrics=metrics)

model.metrics_names的输出:

1
['loss', 'categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']


是的,由于sklearn分类报告为您提供基于支持的加权平均值,因此有所不同。

实验对象:

1
2
3
4
5
from sklearn.metrics import classification_report
y_true = [0, 1,2,1]
y_pred = [0, 0,2,0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

为您提供:
精确召回f1-分数支持

1
2
3
4
5
    class 0       0.33      1.00      0.50         1
    class 1       0.00      0.00      0.00         2
    class 2       1.00      1.00      1.00         1

avg / total       0.33      0.50      0.38         **4**

但是(1 0 0.33)/ 3 = 0.44(3),但是从支撑列看来sklearn返回(1 * 1 0 * 2 0.33 * 1)/4=0.3325