关于r:比较矩阵以找出差异

Compare matrices to find the differences

我有2个矩阵,我想将它们进行比较(明智的选择row.name)以找到差异。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> head(N1)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN     0.00000  0.0003124024
AGO1                     4    0.1666667    37.00000  0.0003133814
APEX1                    4    0.6666667     4.00000  0.0003144654
ATR                      4    0.1666667    19.50000  0.0003128911
CASP3                   24    0.0000000   806.00000  0.0002980626
CCND2                    4    0.3333333    97.33333  0.0003132832

head(N2)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN         0.0  2.279982e-04
ADI1                     1          NaN         0.0  1.728877e-05
AGO1                     3    0.0000000        40.0  2.284670e-04
AIRN                     1          NaN         0.0  1.721733e-05
APEX1                    3    0.6666667         2.0  2.288330e-04
ATR                      3    0.3333333        19.5  2.281542e-04

N1中的许多row.name确实存在于N2中,我想将它们进行比较并将差异写在新矩阵中。对于N1或N2唯一的那些,应提及它们属于N1或N2。

我不确定哪个是计算差值的最佳标准,我能想到的是,简单地将N1中一行的所有值相加,然后从N2中相应行的相加值中减去该值。 >

例如,输出应为:

1
2
3
4
5
6
7
8
> head(Compared)
                       Comparison Unique
    2410016O06RIK        0.0002     Common
    AGO1                 -1.83      Common
    APEX1                 2.24      Common
    ATR                  0.0034     Common
    CASP3               830.00029   N1
    ADI1                1.0007288   N2

在这里为row.name = 2410016O06RIK,添加了N1和N2的所有值,然后在Comparison列中写入了N1-N2,因为该行在两个矩阵中都是通用的,因此common被写入Unique列。


rowSumsmerge进入基数R的一种方式:

如果N1N2是data.frames:

1
2
3
4
5
6
7
8
9
10
11
# compute the row sums and merge N1 and N2
N1$rs <- rowSums(N1, na.rm=TRUE)
N2$rs <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1[,"rs", drop=FALSE], N2[,"rs", drop=FALSE], by="row.names", all=TRUE)

# then compare the row sums and the variable"locations"
comp$Unique <- with(comp, c("N1","N2","common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))])
comp$Comparison <- with(comp, rs.x-rs.y)

# keep only the variable you need:
comp <- comp[, c(1, 5, 4)]

如果N1N2是矩阵:

1
2
3
4
5
6
7
8
9
10
11
# compute the row sums and merge N1 and N2
rs1 <- rowSums(N1, na.rm=TRUE)
rs2 <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1, N2, by="row.names", all=TRUE)

# then compare the row sums and the variable"locations"
comp$Unique <- with(comp, c("N1","N2","common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))])
comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2)

# keep only the variable you need:
comp <- comp[, c("Row.names","Comparison","Unique")]

两种方法的输出:

1
2
3
4
5
6
7
8
9
10
comp
#      Row.names    Comparison Unique
#1 2410016O06RIK  0.0000844042 common
#2          ADI1            NA     N2
#3          AGO1 -1.8332483856 common
#4          AIRN            NA     N2
#5         APEX1  3.0000856324 common
#6           ATR  0.8334181369 common
#7         CASP3            NA     N1
#8         CCND2            NA     N1


这是解决方案的一部分,在res中,您可以使用data.table处理差异部分:

1
2
3
4
5
6
7
8
9
10
11
require(data.table)
require(dplyr)

set.seed(2016)
dt1 <- data.table(V1 = c("a","b","c","d"), V2 = rnorm(4))
dt2 <- data.table(V1 = c("c","d","e","f"), V2 = rnorm(4))

# common <- merge(dt1, dt2, by ="V1")[, Unique :="Common"]
# unique1 <- dt1[V1 %nin% dt2[, V1], ][, Unique :="N1"]
# unique2 <- dt2[V1 %nin% dt1[, V1], ][, Unique :="N2"]
# res <- rbind(common, unique1, unique2, fill = TRUE)

为清晰起见,@ Cath回答后的小更新。

1
2
3
4
5
allMerged <- merge(dt1, dt2, by ="V1", all = TRUE) %>%
  .[, RowSum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("V2", names(.))] %>%
  .[, Unique := ((is.na(V2.x) + 2*is.na(V2.y)))]

print(allMerged)