R: How to identify and label cluster groups in a dendrogram (created by hclust)?
我已经使用hclust识别数据中的聚类,并确定这些聚类的性质。以下是一个非常简化的版本:
1 2 3 4 5 6 7 8 9 10 11 12 | gg <- c(1,2,4,3,3,15,16) hh <- c(1,10,3,10,10,18,16) z <- data.frame(gg,hh) means <- apply(z,2,mean) sds <- apply(z,2,sd) nor <- scale(z,center=means,scale=sds) d <- dist(nor, method ="euclidean") fit <- hclust(d, method="ward.D2") plot(fit) rect.hclust(fit, k=3, border="red") groups <- cutree(fit, k=3) aggregate(nor,list(groups),mean) |
使用聚合,我可以看到这三个聚类包括gg和hh变量值均较低的聚类,gg和平均hh值均较低的聚类以及gg和hh值均较高的聚类
我怎么看这些在树状图上的位置(到目前为止,我只能通过检查组的大小并将它们与树状图上的大小进行比较来辨别)?我如何以某种方式在树状图上标记这些群集组(例如,在每个群集上添加"低","中","高"之类的名称)?我更喜欢基数R
中的答案
不幸的是,没有使用dendextend软件包,没有简单的标签选项。最接近的选择是利用
在这种情况下,如果有两列,我建议您简单地绘制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # your data gg <- c(1,2,4,3,3,15,16) hh <- c(1,10,3,10,10,18,16) z <- data.frame(gg,hh) # a fun visualization function visualize_clusters <- function(z, nclusters = 3, groupcolors = c("blue","black","red"), groupshapes = c(16,17,18), scaled_axes = TRUE){ nor <- scale(z) # already defualts to use the datasets mean, sd) d <- dist(nor, method ="euclidean") fit <<- hclust(d, method ="ward.D2") # saves fit to the environment too groups <- cutree(fit, k = nclusters) if(scaled_axes) z <- nor n <- nrow(z) plot(z, main ="Visualize Clusters", xlim = range(z[,1]), ylim = range(z[,2]), pch = groupshapes[groups], col = groupcolors[groups]) grid(3,3, col ="darkgray") # dividing the plot into a grid of low, medium and high text(z[,1], z[,2], 1:n, pos = 4) centroids <- aggregate(z, list(groups), mean)[,-1] points(centroids, cex = 1, pch = 8, col = groupcolors) for(i in 1:nclusters){ segments(rep(centroids[i,1],n), rep(centroids[i,2],n), z[groups==i,1], z[groups==i,2], col = groupcolors[i]) } legend("topleft", bty ="n", legend = paste("Cluster", 1:nclusters), text.col = groupcolors, cex = .8) } |
现在我们可以将它们绘制在一起:
1 2 3 4 | par(mfrow = c(2,1)) visualize_clusters(z, nclusters = 3, groupcolors = c("blue","black","red")) plot(fit); rect.hclust(fit, 3, border = rev(c("blue","black","red"))) par(mfrow = c(1,1) |
为低,低,中,高-高的眼部检查记录网格。
我喜欢线段。在更大的数据上尝试一下,例如:
1 2 3 4 | gg <- runif(30,1,20) hh <- c(runif(10,5,10),runif(10,10,20),runif(10,1,5)) z <- data.frame(gg,hh) visualize_clusters(z, nclusters = 3, groupcolors = c("blue","black","red")) |
希望这会有所帮助。