R: Assigning variable in data.table using logical statement inside “()” in ifelse function
在问题中,通过对data.table中的时间间隔进行逻辑子集定义变量,我寻求帮助以基于事件之间的时间码(即 该解决方案利用了 问题是,如果要对 哪个生成此 我想为开始事件( 我的第一次尝试是以下代码:
2
3
4
5
6
7
8
9
10
11
12
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F))
event[event==2] <- sample(c(2,3),3,T)
state <-"NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time)
DT[14,] <- DT[13,]
DT[14,event:=3]
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1: A 0 NULL 0.3279207
2: A 1 NULL 1.2824244
3: A 0 NULL 2.1719637
4: A 3 NULL 2.8647671 <- Event 2 or 3 marks the end point
5: A 0 NULL 3.5052739
6: B 0 NULL 0.9942698
7: B 1 NULL 1.6499756
8: B 2 NULL 2.3585060 <- Event 2 or 3 marks the end point
9: B 0 NULL 2.9025721
10: B 0 NULL 3.4967141
11: C 1 NULL 0.2891597
12: C 0 NULL 0.4362734
13: C 2 NULL 1.3992976 <- Here both 2 and 3 appear at the same endpoint
14: C 3 NULL 1.3992976 <- Here both 2 and 3 appear at the same endpoint
15: C 0 NULL 2.9923019
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1: A 0 NULL 0.3279207
2: A 1 1 1.2824244
3: A 0 1 2.1719637
4: A 3 1 2.8647671
5: A 0 NULL 3.5052739
6: B 0 NULL 0.9942698
7: B 1 1 1.6499756
8: B 2 1 2.3585060
9: B 0 NULL 2.9025721
10: B 0 NULL 3.4967141
11: C 1 1 0.2891597
12: C 0 1 0.4362734
13: C 2 1 1.3992976
14: C 3 1 1.3992976
15: C 0 NULL 2.9923019
会显示以下错误消息:
1 2 3 4 | Error in `[.data.table`(DT, , `:=`(state, ifelse(time >= time[event == : Type of RHS ('logical') must match LHS ('character'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1) |
这行代码产生正确的结果,
1 | DT[,state:=ifelse(time>=time[event==1] & time<=time[event==2 | event==3],1,state),by=id] |
,但是当逻辑语句
如果时间在开始点和结束点之间,而我在第一次尝试中通过OR语句定义了结束点,那么如何将值1分配给状态变量?
非常感谢。
您的第一次尝试失败的原因是,当只有一个事件实际发生时,
1 2 | DT[id=='A', time[event==2]] ## numeric(0) |
解决此问题的最简单方法是例如两次的最大值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | DT[, state := ifelse(time >= time[event==1] & time <= max(time[event %in% 2:3]), 1, state), by=id] DT ## id event state time ## 1: A 0 NULL 0.3279207 ## 2: A 1 1 1.2824244 ## 3: A 0 1 2.1719637 ## 4: A 3 1 2.8647671 ## 5: A 0 NULL 3.5052739 ## 6: B 0 NULL 0.9942698 ## 7: B 1 1 1.6499756 ## 8: B 2 1 2.3585060 ## 9: B 0 NULL 2.9025721 ## 10: B 0 NULL 3.4967141 ## 11: C 1 1 0.2891597 ## 12: C 0 1 0.4362734 ## 13: C 2 1 1.3992976 ## 14: C 3 1 1.3992976 ## 15: C 0 NULL 2.9923019 |
我不太熟悉
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | DT[, rows:=1:.N , by=id][ , state:=ifelse(rows >= which(event==1) & rows <= max(which(event==2), which(event==3)), 1, state), by=id] DT id event state time rows 1: A 0 NULL 0.3279207 1 2: A 1 1 1.2824244 2 3: A 0 1 2.1719637 3 4: A 3 1 2.8647671 4 5: A 0 NULL 3.5052739 5 6: B 0 NULL 0.9942698 1 7: B 1 1 1.6499756 2 8: B 2 1 2.3585060 3 9: B 0 NULL 2.9025721 4 10: B 0 NULL 3.4967141 5 11: C 1 1 0.2891597 1 12: C 0 1 0.4362734 2 13: C 2 1 1.3992976 3 14: C 3 1 1.3992976 4 15: C 0 NULL 2.9923019 5 |
您可以解决它,它定义了两个新列。
1 2 3 4 5 6 | DT[, segment := cumsum(event == 1)] DT[, keep := cumsum(c(1, event[-.N]) %in% c(2, 3)) < 1, by = segment] DT[segment == 0, keep := FALSE] DT[keep == TRUE, state := 1] DT[, segment := NULL] DT[, keep := NULL] |