lop %in% over the columns of a data.frame: code not working
我有两个data.frames(这里只报告了一个子集,因为它们太大了):
DF1:
1
2
3
4
5
6
7
8 "G1" "G2"
IL3RA ABCC1
SRSF9 ADAM19
IL22RA2 BIK
UROD ALG3
SLC35C2 GGH
OR12D3 SEC31A
OSBPL3 HIST1H2BK
DF2:
1
2
3
4
5
6
7
8 "S1" "S2" "S3"
IL3RA 0 0
SRSF9 1 1
A1CF 0 0
A1CF1 1 1
GGH 2 0
HIST1H2BK 0 0
AAK1 0 0
我想要以下输出:
1
2
3 "G1" "S2" "S3" "G2" "S2" "S3"
IL3RA 0 0 GGH 2 0
SRSF9 1 1 HIST1H2BK 0 0
我在另一种类似情况下应用了建议给我的功能。 该函数是:
lapply(DF1, function(x) DF2[na.omit(match(DF2[[1]], x)), ])
令人惊讶的是,在这种情况下它不起作用。 我真的不知道为什么。.我在新数据上完全复制了标题为"在data.frame的列上%% in%"的帖子,但什么也没有。 由于DF1和DF2太大,因此我假设该问题出在可用内存中,所以我尝试使用该群集以拥有更多的内存...但是什么也没有。 它给出的输出如下:
1
2
3
4
5
6
7
8 "S1" "S2" "S3"
IL3RA 0 0
SRSF9 1 1
"S1" "S2" "S3"
GGH 2 0
AAK1 0 0
谁能帮我吗?
最好
B.
这应该做。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | df1 <- structure(list(G1 = c("IL3RA","SRSF9","IL22RA2","UROD","SLC35C2", "OR12D3","OSBPL3"), G2 = c("ABCC1","ADAM19","BIK","ALG3", "GGH","SEC31A","HIST1H2BK")), .Names = c("G1","G2"), class ="data.frame", row.names = c(NA, -7L)) df2 <- structure(list(S1 = c("IL3RA","SRSF9","A1CF","A1CF1","GGH", "HIST1H2BK","AAK1"), S2 = c(0L, 1L, 0L, 1L, 2L, 0L, 0L), S3 = c(0L, 1L, 0L, 1L, 0L, 0L, 0L)), .Names = c("S1","S2","S3"), class ="data.frame", row.names = c(NA, -7L)) idx1 <- match(df1$G1, df2$S1) idx1 <- idx1[!is.na(idx1)] idx2 <- match(df1$G2, df2$S1) idx2 <- idx2[!is.na(idx2)] out <- cbind(df2[idx1, ], df2[idx2, ]) > out S1 S2 S3 S1 S2 S3 1 IL3RA 0 0 GGH 2 0 2 SRSF9 1 1 HIST1H2BK 0 0 |
编辑:使用
1 2 3 4 5 6 7 8 9 10 11 | out <- lapply(df1, function(x) { idx <- match(x, df2$S1) idx <- idx[!is.na(idx)] df2[idx, ] }) # now `out` is a list of data.frames out.f <- do.call(cbind, out) # they'll be combined by columns G1.S1 G1.S2 G1.S3 G2.S1 G2.S2 G2.S3 1 IL3RA 0 0 GGH 2 0 2 SRSF9 1 1 HIST1H2BK 0 0 |