How to generate a sequence of monthly dates from a data frame in R?
请考虑以下数据帧(df):
1 2 3 4 5 6 7 8 9 10 | "id" "date_start" "date_end" a 2012-03-11 2012-03-27 a 2012-05-17 2012-07-21 a 2012-06-09 2012-08-18 b 2015-06-21 2015-07-12 b 2015-06-27 2015-08-04 b 2015-07-02 2015-08-01 c 2017-10-11 2017-11-08 c 2017-11-27 2017-12-15 c 2017-01-02 2018-02-03 |
我正在尝试创建一个新的数据框,该数据框具有按月日期顺序,从" id "中每个组的" date_start "最小值的前一个月开始。该序列还仅包括从一个月的第一天开始的日期,并以" id "中的每个组的最大值" date-end "结束。
这是我的数据框的可复制示例:
1 2 3 4 5 6 | library(lubridate) id <- c("a","a","a","b","b","b","c","c","c") df <- data.frame(id) df$date_start <- as.Date(c("2012-03-11","2012-05-17","2012-06-09","2015-06-21","2015-06-27","2015-07-02","2017-10-11","2017-11-27","2018-01-02")) df$date_end <- as.Date(c("2012-03-27","2012-07-21","2012-08-18","2015-07-12","2015-08-04","2015-08-012","2017-11-08","2017-12-15","2018-02-03")) |
我尝试做的事情:
1 2 3 4 5 6 7 8 | library(dplyr) library(Desctools) library(timeDate) df2 <- df %>% group_by(id) %>% summarize(start= floor_date(AddMonths(min(date_start),-1),"month"),end=max(date_end)) %>% do(data.frame(id=.$id, date=seq(.$start,.$end,by="1 month"))) |
该代码对于未分组的数据帧非常适用。以某种方式使用" id "进行分组会引发错误消息:
1 2 | Error in seq.default(.$date_start, .$date_end, by ="1 month") : 'from' must be of length 1 |
这是上面给出的数据帧所需输出的样子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | "id" "date" a 2012-02-01 a 2012-03-01 a 2012-04-01 a 2012-05-01 a 2012-06-01 a 2012-07-01 a 2012-08-01 b 2015-05-01 b 2015-06-01 b 2015-07-01 b 2015-08-01 c 2017-09-01 c 2017-10-01 c 2017-11-01 c 2017-12-01 c 2018-01-01 c 2018-02-01 |
是否有一种方法可以更改代码以使其与分组的数据帧一起使用?此操作是否有完全不同的方法?
使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | library(dplyr) library(lubridate) df %>% group_by(id) %>% summarise(date = list(seq(floor_date(min(date_start),unit ="month") - months(1), floor_date(max(date_end), unit ="month"), by ="month"))) %>% tidyr::unnest() # id date # <fct> <date> # 1 a 2012-02-01 # 2 a 2012-03-01 # 3 a 2012-04-01 # 4 a 2012-05-01 # 5 a 2012-06-01 # 6 a 2012-07-01 # 7 a 2012-08-01 # 8 b 2015-05-01 # 9 b 2015-06-01 #10 b 2015-07-01 #11 b 2015-08-01 #12 c 2017-09-01 #13 c 2017-10-01 #14 c 2017-11-01 #15 c 2017-12-01 #16 c 2018-01-01 #17 c 2018-02-01 |
在您的代码中,由于
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | df %>% group_by(id) %>% summarize(start= floor_date(AddMonths(min(date_start),-1),"month"),end=max(date_end)) %>% group_by(rn=row_number()) %>% do(data.frame(id=.$id, date=seq(.$start, .$end, by="1 month"))) %>% ungroup() %>% select(-rn) # A tibble: 17 x 2 id date <fct> <date> 1 a 2012-02-01 2 a 2012-03-01 3 a 2012-04-01 4 a 2012-05-01 5 a 2012-06-01 6 a 2012-07-01 7 a 2012-08-01 8 b 2015-05-01 9 b 2015-06-01 10 b 2015-07-01 11 b 2015-08-01 12 c 2017-09-01 13 c 2017-10-01 14 c 2017-11-01 15 c 2017-12-01 16 c 2018-01-01 17 c 2018-02-01 |
使用
1 2 3 4 5 6 7 8 9 | library(dplyr) library(zoo) df %>% group_by(id) %>% do( data.frame(month = as.Date(seq(as.yearmon(min(.$date_start)) - 1/12, as.yearmon(max(.$date_end)), 1/12) ))) %>% ungroup |
给予:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # A tibble: 17 x 2 id month <fct> <date> 1 a 2012-02-01 2 a 2012-03-01 3 a 2012-04-01 4 a 2012-05-01 5 a 2012-06-01 6 a 2012-07-01 7 a 2012-08-01 8 b 2015-05-01 9 b 2015-06-01 10 b 2015-07-01 11 b 2015-08-01 12 c 2017-09-01 13 c 2017-10-01 14 c 2017-11-01 15 c 2017-12-01 16 c 2018-01-01 17 c 2018-02-01 |
也可以使用与上述相同的
1 2 3 4 5 | Seq <- function(st, en) as.Date(seq(as.yearmon(st) - 1/12, as.yearmon(en), 1/12)) df %>% group_by(id) %>% do( data.frame(month = Seq(min(.$date_start), max(.$date_end))) ) %>% ungroup |