关于r：根据其他列值求和

Summing a column based on other columns values

我想尽可能高效地将一个列的值与另一个列的值相加。我不确定是否可以使用summary命令。这是一个示例数据集：

1
2
3
4

Cancer1 Cancer2 Cancer3 Disease1
1 0 1 1
0 1 0 0
1 0 0 1

在这种情况下，我希望根据患者是否患有某种癌症来对疾病1求和。我正在寻找一个输出，该结果将表示患有Cancer1和Disease1的总人数为2，患有Cancer2和Disease1的总人数为0，而患有Cancer3和Disease1的总人数为1。 >

我们可以在\\'Cancer \\'列上使用rowSums创建变量，然后与二进制\\'Disease \\'列相乘

1
2
3

df1$newCol <- (rowSums(df1[1:3] > 0)) * df1$Disease1
df1$newCol
#[1] 2 0 1

我想立即提供一些有关数据格式的建议(而不是主动提供)，而不是直接提供代码答案：

在我看来，您可以从一张长桌子而不是一张宽桌子中获益很多(您可能拥有更多的癌症类型，例如" cancer_n "；以及更多的疾病，例如" disease_n ")。对于长表，您可能会发现有必要为每条记录定义某种ID。另外，为了保证结果的完整性，我想提供一个data.table解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

require(data.table) # loads the package

a <- data.table(id = 1:3,
Cancer1 = c(1,0,1),
Cancer2 = c(0,1,0),
Cancer3 = c(1, 0,0),
Disease1 = c(1,0,1)) # create a data.table with an additional id

# melt the data.table (make it long-form), and calculate the expected result:
melt(a, c("Disease1","id"))[Disease1 == 1 & value == 1, .N, by = variable]

variable N
1: Cancer1 2
2: Cancer3 1

您可能想看看dplyr::count()。

1
2
3
4
5

# sum up the number of people that have Cancer1 and Disease1:
foo <- ds %>% count(Cancer1 , Disease1)

# extract the integer result you are looking for:
foo %>% filter(Cancer1 == 1, Disease1== 1) %>% pull(n)