R: Split observation values by and aggregate to time intervals
在某些区域(名称)上有来自各个观测点(obs)的鸟类观测。记录了开始和结束时间,并使用校正因子重新计算了时间差(diff_corr),因此它不只是开始-结束间隔的
我现在需要将这些值"拆分"为"很好"的间隔(15分钟,例如10:15:00、10:30:00等),然后按区域汇总(名称),以便能够在15分钟的间隔内绘制出鸟类在这些区域中的存在情况。
因此,为了更清楚一点:观察可能始于10:14并一直持续到10:25,所以它跨越了10:00-10:15和10:15-10:30的时间间隔,因此,应该将我得到的值除以相应的间隔,并根据它们具有的值分配给该间隔。
在更复杂的设置中,观察值可能跨越3或4个间隔,因此该值也必须在此相应地拆分。
最后一步是将每个时间间隔的所有观测部分汇总并绘制它们。
我已经搜索了几天的解决方案,但只发现了非常简单的示例,其中用
示例数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | structure(list(obs = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("b", "C2","Dürnberg2"), class ="factor"), name = c("C2","C2", "C2","C2","C2","C2","C2","C2","C2","b","981","1627", "b","b","981","1627","b","b","b","b"), start = structure(c(1495441500, 1495441590, 1495441650, 1495441680, 1495447380, 1495447410, 1495447530, 1495447560, 1495447580, 1496996580, 1496996580, 1496996580, 1496996760, 1496996820, 1496996820, 1496996820, 1496997180, 1496997300, 1496997420, 1496998260), class = c("POSIXct","POSIXt"), tzone =""), end = structure(c(1495441590, 1495441650, 1495441680, 1495441800, 1495447410, 1495447530, 1495447560, 1495447580, 1495447620, 1496996760, 1496996760, 1496996760, 1496996820, 1496997180, 1496997180, 1496997180, 1496997300, 1496997420, 1496997540, 1496998320), class = c("POSIXct","POSIXt"), tzone =""), diff_corr = c(1.46739130434783, 0.978260869565217, 0.489130434782609, 1.95652173913043, 0.489130434782609, 1.95652173913043, 0.489130434782609, 0.326086956521739, 0.652173913043478, 2.96703296703297, 2.96703296703297, 2.96703296703297, 0.989010989010989, 5.93406593406593, 5.93406593406593, 5.93406593406593, 1.97802197802198, 1.97802197802198, 1.97802197802198, 0.989010989010989)), .Names = c("obs", "name","start","end","diff_corr"), row.names = c("1","9", "7","8","3","2","4","5","6","13","13.1","13.2","22", "11","11.1","11.2","12","23","15","16"), class ="data.frame") |
p.s。我确实很难正确地命名我的问题,因此任何提示(不仅是关于此的提示)都受到高度赞赏
一个小例子的新尝试:
按间隔的比例将值分配给间隔(并随后求和等于间隔)
1 2 3 4 5 | start end value new values in new 15-min-intervals 10:03:00 10:14:00 11 ---> 10:00:00 = 11 10:14:00 10:16:00 2 ---> 10:00:00 = 1 ; 10:15:00 = 1 10:00:00 10:35:00 40 ---> 10:00:00 = 40/35*15 ; 10:15:00 = 40/35*15 ; 10:30:00 = 40/35*5 10:15:00 10:30:00 12 ---> 10:15:00 = 12 |
这是一种
数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | > p obs name start end diff_corr 1: C2 C2 2017-05-22 04:25:00 2017-05-22 04:26:30 1.4673913 2: C2 C2 2017-05-22 04:26:30 2017-05-22 04:27:30 0.9782609 3: C2 C2 2017-05-22 04:27:30 2017-05-22 04:28:00 0.4891304 4: C2 C2 2017-05-22 04:28:00 2017-05-22 04:30:00 1.9565217 5: C2 C2 2017-05-22 06:03:00 2017-05-22 06:03:30 0.4891304 6: C2 C2 2017-05-22 06:03:30 2017-05-22 06:05:30 1.9565217 7: C2 C2 2017-05-22 06:05:30 2017-05-22 06:06:00 0.4891304 8: C2 C2 2017-05-22 06:06:00 2017-05-22 06:06:20 0.3260870 9: C2 C2 2017-05-22 06:06:20 2017-05-22 06:07:00 0.6521739 10: b b 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330 11: b 981 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330 12: b 1627 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330 13: b b 2017-06-09 04:26:00 2017-06-09 04:27:00 0.9890110 14: b b 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659 15: b 981 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659 16: b 1627 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659 17: b b 2017-06-09 04:33:00 2017-06-09 04:35:00 1.9780220 18: b b 2017-06-09 04:35:00 2017-06-09 04:37:00 1.9780220 19: b b 2017-06-09 04:37:00 2017-06-09 04:39:00 1.9780220 20: b b 2017-06-09 04:51:00 2017-06-09 04:52:00 0.9890110 |
代码
1 2 3 4 | library(data.table) library(lubridate) p <- as.data.table(p) p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit ="15min"))] |
输出
1 2 3 4 5 6 | > p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit ="15min"))] tme_start new_diff 1: 2017-05-22 04:30:00 1.2228261 2: 2017-05-22 06:00:00 0.7826087 3: 2017-06-09 04:30:00 3.3626374 4: 2017-06-09 04:45:00 0.9890110 |
Data.Table在做什么?
由于您不熟悉
1 | DT[select rows, perform operations, group by] |
其中
返回答案-其他变化
从问题或注释中不清楚您是否要基于
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | > p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit ="15min"), obs)] tme_start obs new_diff 1: 2017-05-22 04:30:00 C2 1.2228261 2: 2017-05-22 06:00:00 C2 0.7826087 3: 2017-06-09 04:30:00 b 3.3626374 4: 2017-06-09 04:45:00 b 0.9890110 > p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit ="15min"), name)] tme_start name new_diff 1: 2017-05-22 04:30:00 C2 1.2228261 2: 2017-05-22 06:00:00 C2 0.7826087 3: 2017-06-09 04:30:00 b 2.6373626 4: 2017-06-09 04:30:00 981 4.4505495 5: 2017-06-09 04:30:00 1627 4.4505495 6: 2017-06-09 04:45:00 b 0.9890110 |
这是缓慢且笨拙的,但也许会有所帮助。按名称和15分钟间隔计算计数和加权diff_corr总和:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | library(dplyr) range <- seq.POSIXt(min(df$start)-(15*60), max(df$end)+(15*60), by ="15 min") df$totalDuration <- as.numeric(as.difftime(df$end-df$start),units=c("secs")) out <- NULL for (r in 1:length(range)){ subset <- df %>% filter( (start >= (range[r]-(15*60)) & start<range[r]) | (end>= (range[r]-(15*60)) & end<range[r] ) | (end > range[r] & start < range[r])) %>% mutate(bin=range[r], duration = ifelse(start>=(range[r]-(15*60)) & end<range[r],totalDuration, ifelse(start>=(range[r]-(15*60)),as.numeric(as.difftime(range[r]-start),units="secs"), ifelse(end<range[r], as.numeric(as.difftime(end-(range[r]-(15*60))),units="secs"), as.numeric(as.difftime(range[r]-(range[r]-(15*60))),units="secs") ))) ) %>% mutate (diff_corr_W = diff_corr*(duration/as.double(totalDuration, units='secs'))) %>% group_by(bin,name) %>% summarise(count=n(), diff_corr_sum = sum(diff_corr_W)) %>% ungroup() if (is.null(out)){ out <- subset } else { out <- rbind(out,subset) } } > out # A tibble: 9 x 4 bin name count diff_corr_sum * <dttm> <chr> <int> <dbl> 1 2017-05-22 04:40:00 C2 4 4.891304 2 2017-05-22 06:10:00 C2 5 3.913043 3 2017-06-09 04:25:00 1627 1 1.978022 4 2017-06-09 04:25:00 981 1 1.978022 5 2017-06-09 04:25:00 b 1 1.978022 6 2017-06-09 04:40:00 1627 2 6.923077 7 2017-06-09 04:40:00 981 2 6.923077 8 2017-06-09 04:40:00 b 6 13.846154 9 2017-06-09 04:55:00 b 1 0.989011 |