Creating last observation flags for grouped data with dplyr
我已经搜索并找到了很多解决方案,这些解决方案很接近,但并不能完全回答我的问题。
我想要一个向数据添加0/1标志的函数,该标志指示每单位的最后一次观察。数据按单位和完成的测试分组。
我想使用dplyr并进行以下尝试,但是第二个
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){ data <- arrange_(data, id, test, time) %>% mutate_(lastObsFlag = 0) %>% group_by_(id, test) %>% mutate_(lastObsFlag = replace(time, n(), 1)) as.data.frame(data) } # Restructure pbcseq from the survival package junk <- gather(pbcseq, test, value, 12:18) # That just loaded reshape2 and plyr, so unload them unloadNamespace("reshape2") unloadNamespace("plyr") getLastObsFlag(junk, id="id", time="day", test="test") |
对
我已经读到这是同时安装plyr和dplyr的问题(我希望使用
我将不胜感激任何指针。我不隶属于
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 [6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel splines stats graphics grDevices utils datasets methods base other attached packages: [1] dmhelp_0.5 brglm_0.5-9 profileModel_0.5-9 dplyr_0.4.3 tidyr_0.2.0 gbm_2.1.1 lattice_0.20-33 [8] survival_2.38-3 loaded via a namespace (and not attached): [1] Rcpp_0.12.0 assertthat_0.1 MASS_7.3-44 grid_3.2.2 R6_2.1.1 DBI_0.3.1 magrittr_1.5 stringi_0.5-5 [9] lazyeval_0.1.10 tools_3.2.2 stringr_1.0.0 |
我们可以在
测试结果...
1 2 3 4 5 6 7 8 9 10 11 | id futime status trt age sex day ascites hepato spiders edema stage test value flag 1 1 400 2 1 58.76523 f 0 1 1 1 1 4 bili 14.50 0 2 1 400 2 1 58.76523 f 0 1 1 1 1 4 chol 261.00 0 3 1 400 2 1 58.76523 f 0 1 1 1 1 4 albumin 2.60 0 4 1 400 2 1 58.76523 f 0 1 1 1 1 4 alk.phos 1718.00 0 5 1 400 2 1 58.76523 f 0 1 1 1 1 4 ast 138.00 0 6 1 400 2 1 58.76523 f 0 1 1 1 1 4 platelet 190.00 0 7 1 400 2 1 58.76523 f 0 1 1 1 1 4 protime 12.20 0 8 1 400 2 1 58.76523 f 192 1 1 1 1 4 bili 21.30 1 9 1 400 2 1 58.76523 f 192 1 1 1 1 4 chol NA 1 10 1 400 2 1 58.76523 f 192 1 1 1 1 4 albumin 2.94 1 |
我们可以使用
1 2 3 4 5 6 7 8 9 | library(lazyeval) getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){ data <- arrange_(data, id, test, time) %>% mutate_(lastObsFlag = 0) %>% group_by_(id, test) %>% mutate_(.dots=list(lastObsFlag = interp(~replace(lastObsFlag, n(), 1)))) as.data.frame(data) } |
根据测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | head(getLastObsFlag(junk, id="id", time="day", test="test"),25)[c('id', 'test', 'lastObsFlag')] # id test lastObsFlag #1 1 bili 0 #2 1 bili 1 #3 1 chol 0 #4 1 chol 1 #5 1 albumin 0 #6 1 albumin 1 #7 1 alk.phos 0 #8 1 alk.phos 1 #9 1 ast 0 #10 1 ast 1 #11 1 platelet 0 #12 1 platelet 1 #13 1 protime 0 #14 1 protime 1 #15 2 bili 0 #16 2 bili 0 #17 2 bili 0 #18 2 bili 0 #19 2 bili 0 #20 2 bili 0 #21 2 bili 0 #22 2 bili 0 #23 2 bili 1 #24 2 chol 0 #25 2 chol 0 |