geom_abline does not seem to respect groups in facet_grid [ggplot2]
只是试图了解geom_abline如何与ggplot中的构面一起使用。
我有一个学生考试成绩的数据集。这些位于具有4列的数据表dt中:
1 2 3 4 | student: unique student ID cohort: grouping factor for students (A, B, a€| H) subject: subject of the test (English, Math, Science) score: the test score for that student in that subject |
目标是比较同类群组。以下代码段创建了一个样本数据集。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(data.table) ## cohorts: list of cohorts with number of students in each cohorts <- data.table(name=toupper(letters[1:8]),size=as.numeric(c(8,25,16,30,10,27,13,32))) ## base: assign students to cohorts base <- data.table(student=c(1:sum(cohorts$size)),cohort=rep(cohorts$name,cohorts$size)) ## scores for each subject english <- data.table(base,subject="English", score=rnorm(nrow(base), mean=45, sd=50)) math <- data.table(base,subject="Math", score=rnorm(nrow(base), mean=55, sd=25)) science <- data.table(base,subject="Science", score=rnorm(nrow(base), mean=70, sd=25)) ## combine dt <- rbind(english,math,science) ## clip scores to (0,100) dt$score<- (dt$score>=0) * dt$score dt$score<- (dt$score<=100)*dt$score + (dt$score>100)*100 |
以下显示的是按受试者分组且按主题划分的,具有95%CL的同类群组的平均得分,并包括(蓝色,虚线)参考线(使用geom_abline)。
1 2 3 4 5 6 7 | library(ggplot2) library(Hmisc) ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100) ggp <- ggp + stat_summary(fun.data="mean_cl_normal") ggp <- ggp + geom_abline(aes(slope=0,intercept=mean(score)),color="blue",linetype="dashed") ggp <- ggp + facet_grid(subject~.) ggp |
问题是参考线(来自geom_abline)在所有方面都是相同的(=所有学生和所有科目的总平均分)。因此stat_summary似乎尊重facet_grid中隐含的分组(例如,按主题),但abline则不这样做。谁能解释为什么?
NB:我意识到可以通过创建一个单独的分组均值表并将其用作geom_abline(如下)中的数据源来解决此问题,但是为什么这样做是必需的?
1 2 3 4 5 6 | means <- dt[,list(mean.score=mean(score)),by="subject"] ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100) ggp <- ggp + stat_summary(fun.data="mean_cl_normal") ggp <- ggp + geom_abline(data=means, aes(slope=0,intercept=mean.score),color="blue",linetype="dashed") ggp <- ggp + facet_grid(subject~.) ggp |
这应该做您想要的。
1 2 3 4 | ggplot(dt,aes(x=cohort, y=score)) + stat_summary(fun.data="mean_cl_normal") + stat_smooth(formula=y~1,aes(group=1),method="lm",se=FALSE) + facet_grid(subject~.) + ylim(0,100) |
正如golbasche所提到的,我可能会做更多这样的事情:
1 2 3 4 5 6 7 | dt <- dt[,avg_score := mean(score),by = subject] ggplot(dt,aes(x=cohort, y=score)) + facet_grid(subject~.) + stat_summary(fun.data="mean_cl_normal") + geom_hline(aes(yintercept = avg_score),color ="blue",linetype ="dashed") + ylim(0,100) |