Compare matrices to find the differences
我有2个矩阵,我想将它们进行比较(明智的选择row.name)以找到差异。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | > head(N1) Total_Degree Transitivity Betweenness Closeness_All 2410016O06RIK 1 NaN 0.00000 0.0003124024 AGO1 4 0.1666667 37.00000 0.0003133814 APEX1 4 0.6666667 4.00000 0.0003144654 ATR 4 0.1666667 19.50000 0.0003128911 CASP3 24 0.0000000 806.00000 0.0002980626 CCND2 4 0.3333333 97.33333 0.0003132832 head(N2) Total_Degree Transitivity Betweenness Closeness_All 2410016O06RIK 1 NaN 0.0 2.279982e-04 ADI1 1 NaN 0.0 1.728877e-05 AGO1 3 0.0000000 40.0 2.284670e-04 AIRN 1 NaN 0.0 1.721733e-05 APEX1 3 0.6666667 2.0 2.288330e-04 ATR 3 0.3333333 19.5 2.281542e-04 |
N1中的许多row.name确实存在于N2中,我想将它们进行比较并将差异写在新矩阵中。对于N1或N2唯一的那些,应提及它们属于N1或N2。
我不确定哪个是计算差值的最佳标准,我能想到的是,简单地将N1中一行的所有值相加,然后从N2中相应行的相加值中减去该值。 >
例如,输出应为:
1 2 3 4 5 6 7 8 | > head(Compared) Comparison Unique 2410016O06RIK 0.0002 Common AGO1 -1.83 Common APEX1 2.24 Common ATR 0.0034 Common CASP3 830.00029 N1 ADI1 1.0007288 N2 |
在这里为row.name =
以
如果
1 2 3 4 5 6 7 8 9 10 11 | # compute the row sums and merge N1 and N2 N1$rs <- rowSums(N1, na.rm=TRUE) N2$rs <- rowSums(N2, na.rm=TRUE) comp <- merge(N1[,"rs", drop=FALSE], N2[,"rs", drop=FALSE], by="row.names", all=TRUE) # then compare the row sums and the variable"locations" comp$Unique <- with(comp, c("N1","N2","common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))]) comp$Comparison <- with(comp, rs.x-rs.y) # keep only the variable you need: comp <- comp[, c(1, 5, 4)] |
如果
1 2 3 4 5 6 7 8 9 10 11 | # compute the row sums and merge N1 and N2 rs1 <- rowSums(N1, na.rm=TRUE) rs2 <- rowSums(N2, na.rm=TRUE) comp <- merge(N1, N2, by="row.names", all=TRUE) # then compare the row sums and the variable"locations" comp$Unique <- with(comp, c("N1","N2","common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))]) comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2) # keep only the variable you need: comp <- comp[, c("Row.names","Comparison","Unique")] |
两种方法的输出:
1 2 3 4 5 6 7 8 9 10 | comp # Row.names Comparison Unique #1 2410016O06RIK 0.0000844042 common #2 ADI1 NA N2 #3 AGO1 -1.8332483856 common #4 AIRN NA N2 #5 APEX1 3.0000856324 common #6 ATR 0.8334181369 common #7 CASP3 NA N1 #8 CCND2 NA N1 |
这是解决方案的一部分,在
1 2 3 4 5 6 7 8 9 10 11 | require(data.table) require(dplyr) set.seed(2016) dt1 <- data.table(V1 = c("a","b","c","d"), V2 = rnorm(4)) dt2 <- data.table(V1 = c("c","d","e","f"), V2 = rnorm(4)) # common <- merge(dt1, dt2, by ="V1")[, Unique :="Common"] # unique1 <- dt1[V1 %nin% dt2[, V1], ][, Unique :="N1"] # unique2 <- dt2[V1 %nin% dt1[, V1], ][, Unique :="N2"] # res <- rbind(common, unique1, unique2, fill = TRUE) |
为清晰起见,@ Cath回答后的小更新。
1 2 3 4 5 | allMerged <- merge(dt1, dt2, by ="V1", all = TRUE) %>% .[, RowSum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("V2", names(.))] %>% .[, Unique := ((is.na(V2.x) + 2*is.na(V2.y)))] print(allMerged) |