R- random sample of groups in a data.table
如何随机抽样在data.table中包含三个组,以便结果包含三个组,其中包含原始data.table中的所有行?
1 2 3 | library(data.table) dat <- data.table(ids=1:20, groups=sample(x=c("A","B","C","D","E","F"), 20, replace=TRUE)) |
我知道如何从数据表中随机选择10行:
1 | dat.sampl1 <- as.data.table(sapply(dat[], sample, 10)) |
以及如何按组
进行采样
1 | dat[,.SD[sample(.N, min(.N,3))], by = groups] |
但是如何随机分组?因此结果应如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 | ids groups 1 F 11 F 3 F 18 F 8 A 9 A 10 A 17 A 19 A 12 E 14 E 16 E |
您的意思是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | set.seed(123) dat <- data.table(ids=1:20, groups=sample(x=c("A","B","C","D","E","F"), 20, replace=TRUE)) dat[groups %in% sample(unique(dat[, groups]), size = 3)][order(groups)] # ids groups # 1: 3 C # 2: 10 C # 3: 12 C # 4: 7 D # 5: 9 D # 6: 14 D # 7: 4 F # 8: 5 F # 9: 8 F # 10: 11 F # 11: 16 F # 12: 20 F |
如果要对要替换的组进行采样,可以执行以下操作,其中
1 2 3 4 5 6 7 8 9 10 11 | dat[unique(dat[, list(groups)])[sample(.N, 3, replace = TRUE)], on ="groups"] # ids groups # 1: 3 C # 2: 10 C # 3: 12 C # 4: 6 A # 5: 15 A # 6: 18 A # 7: 6 A # 8: 15 A # 9: 18 A |
此代码有效,使用单行的基础R代码(使用
1 | df1[df1[,'groups'] %in% sample(unique(df1[,'groups']), size = 3, replace = F), ] |
例如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | > df1 <- data.frame("ids" = 1:20,"groups" = sample(LETTERS[1:4], size = 20, replace = T)) > df2 <- df1[df1[,'groups'] %in% sample(unique(df1[,'groups']), size = 3, replace = F), ] > df2[order(df2[,'groups']),] ids groups 4 4 B 6 6 B 18 18 B 20 20 B 1 1 C 2 2 C 3 3 C 9 9 C 12 12 C 16 16 C 19 19 C 7 7 D 11 11 D |