关于r：比较矩阵以找出差异

Compare matrices to find the differences

我有2个矩阵，我想将它们进行比较(明智的选择row.name)以找到差异。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

> head(N1)
Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK 1 NaN 0.00000 0.0003124024
AGO1 4 0.1666667 37.00000 0.0003133814
APEX1 4 0.6666667 4.00000 0.0003144654
ATR 4 0.1666667 19.50000 0.0003128911
CASP3 24 0.0000000 806.00000 0.0002980626
CCND2 4 0.3333333 97.33333 0.0003132832

head(N2)
Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK 1 NaN 0.0 2.279982e-04
ADI1 1 NaN 0.0 1.728877e-05
AGO1 3 0.0000000 40.0 2.284670e-04
AIRN 1 NaN 0.0 1.721733e-05
APEX1 3 0.6666667 2.0 2.288330e-04
ATR 3 0.3333333 19.5 2.281542e-04

N1中的许多row.name确实存在于N2中，我想将它们进行比较并将差异写在新矩阵中。对于N1或N2唯一的那些，应提及它们属于N1或N2。

我不确定哪个是计算差值的最佳标准，我能想到的是，简单地将N1中一行的所有值相加，然后从N2中相应行的相加值中减去该值。 >

例如，输出应为：

1
2
3
4
5
6
7
8

> head(Compared)
Comparison Unique
2410016O06RIK 0.0002 Common
AGO1 -1.83 Common
APEX1 2.24 Common
ATR 0.0034 Common
CASP3 830.00029 N1
ADI1 1.0007288 N2

在这里为row.name = 2410016O06RIK，添加了N1和N2的所有值，然后在Comparison列中写入了N1-N2，因为该行在两个矩阵中都是通用的，因此common被写入Unique列。

相关讨论

以rowSums和merge进入基数R的一种方式：

如果N1和N2是data.frames：

1
2
3
4
5
6
7
8
9
10
11

# compute the row sums and merge N1 and N2
N1$rs <- rowSums(N1, na.rm=TRUE)
N2$rs <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1[,"rs", drop=FALSE], N2[,"rs", drop=FALSE], by="row.names", all=TRUE)

# then compare the row sums and the variable"locations"
comp$Unique <- with(comp, c("N1","N2","common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))])
comp$Comparison <- with(comp, rs.x-rs.y)

# keep only the variable you need:
comp <- comp[, c(1, 5, 4)]

如果N1和N2是矩阵：

1
2
3
4
5
6
7
8
9
10
11

# compute the row sums and merge N1 and N2
rs1 <- rowSums(N1, na.rm=TRUE)
rs2 <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1, N2, by="row.names", all=TRUE)

# then compare the row sums and the variable"locations"
comp$Unique <- with(comp, c("N1","N2","common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))])
comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2)

# keep only the variable you need:
comp <- comp[, c("Row.names","Comparison","Unique")]

两种方法的输出：

1
2
3
4
5
6
7
8
9
10

comp
# Row.names Comparison Unique
#1 2410016O06RIK 0.0000844042 common
#2 ADI1 NA N2
#3 AGO1 -1.8332483856 common
#4 AIRN NA N2
#5 APEX1 3.0000856324 common
#6 ATR 0.8334181369 common
#7 CASP3 NA N1
#8 CCND2 NA N1