The predict from a nnet is a character and not a factor
我担心的是,当我训练一个nnet时,该类是类型因子,但是当我进行预测时,会返回一个chr。
我从另一个帖子中引用了此示例。
1 2 3 4 5 6 7 8 9 10 11 12 13 | library(nnet) library(C50) library(caret) attach(iris) set.seed(3456) trainIndex <- createDataPartition(iris$Species, p = .8, list = FALSE, times = 1) irisTrain <- iris[ trainIndex,] irisTest <- iris[-trainIndex,] irispred <- nnet(Species ~ ., data=irisTrain, size=10) predicted <- predict(irispred,irisTest,type="class") |
和
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | > str(irisTrain) 'data.frame': 120 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.6 5 5.4 5 4.4 4.9 5.4 4.8 ... $ Sepal.Width : num 3.5 3 3.1 3.6 3.9 3.4 2.9 3.1 3.7 3 ... $ Petal.Length: num 1.4 1.4 1.5 1.4 1.7 1.5 1.4 1.5 1.5 1.4 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.1 0.2 0.1 ... $ Species : Factor w/ 3 levels"setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... > str(irisTest) 'data.frame': 30 obs. of 5 variables: $ Sepal.Length: num 4.7 4.6 4.8 4.3 5.4 4.6 5 5 4.6 5.3 ... $ Sepal.Width : num 3.2 3.4 3.4 3 3.4 3.6 3.5 3.5 3.2 3.7 ... $ Petal.Length: num 1.3 1.4 1.6 1.1 1.7 1 1.3 1.6 1.4 1.5 ... $ Petal.Width : num 0.2 0.3 0.2 0.1 0.2 0.2 0.3 0.6 0.2 0.2 ... $ Species : Factor w/ 3 levels"setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... |
因此在训练和测试数据集中物种是因素,但
1 2 | str(predicted) chr [1:30]"setosa""setosa""setosa""setosa""setosa" ... |
预测结果是特征。我正在使用其他数据挖掘程序包,例如C50,它们从预测中返回因子,
1 2 3 4 | > irispred <- C5.0(Species ~ ., data=irisTrain) > predicted <- predict(irispred,irisTest,type="class") > str(predicted) Factor w/ 3 levels"setosa","versicolor",..: 1 1 1 1 1 1 1 2 1 1 ... |
我希望一致的,基于因子的格式可用于预测的输出。在nnet的情况下,将预测的字符输出转换为因数不起作用,因为我不能保证所有级别都将作为字符变量出现。例如,在我的650个案例中,有一个案例具有唯一级别,有时可能在测试数据集中,有时没有,但是我希望即使预测数据不在测试数据中,它的输出也要知道。铅>
谢谢。
与
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | iris2 <- iris iris2$Species <- factor(iris2$Species, levels = c("versicolor","banana","setosa","cherry","virginica")) iris_pred <- nnet(Species ~ ., data = iris2[trainIndex, ], size = 10) #Warning message: #In nnet.formula(Species ~ ., data = iris2[trainIndex, ], size = 10) : # groups ‘banana’ ‘cherry’ are empty identical(iris_pred$lev, levels(iris2$Species)) #[1] TRUE predicted <- predict(iris_pred, iris2[-trainIndex, ], type="class") predicted_fac <- factor(predicted, levels = iris_pred$lev) table(iris2[-trainIndex,"Species"], predicted_fac) # predicted_fac # versicolor banana setosa cherry virginica # versicolor 10 0 0 0 0 # banana 0 0 0 0 0 # setosa 0 0 10 0 0 # cherry 0 0 0 0 0 # virginica 0 0 0 0 10 |