How do I pass a variable name to conditionally sum in dplyr pipe?
问题的症结在于如何将列变量传递到分组的df中以有条件地对数据求和。该示例的数据如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library(dplyr) library(rlang) set.seed(1) # dummy dates date_vars <- purrr::map(c('2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30', '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31', '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31'), as.Date) %>% purrr::reduce(c) dummy_df <- tibble( id = rep(c("a","b","c"), each = 12), date = rep(date_vars, 3), value = runif(36, 1, 10) ) |
下面的函数将获取一个数据帧,并按变量分组(使用rlang的sym函数),然后通过添加日期大于或等于某个日期周期的所有值来创建新的摘要列。在这里,我总结了3个月的\\'values \\'。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | agg_by_period <- function(df, date_period, period, grouping, new_col_prefix){ grouping_vars <- syms(grouping) new_sum_column <- quo_name(paste0(new_col_prefix,"sum_", period, 'm')) df %>% group_by(!!!grouping_vars) %>% summarize(!!new_sum_column := sum(value[date >= date_period], na.rm = T)) %>% select(!!!grouping_vars, !!sym(new_sum_column)) } agg_by_period(df = dummy_df, date_period = as.Date('2018-10-31'), grouping = 'id', period = 3, new_col_prefix = 'new_' ) # A tibble: 3 x 2 id new_sum_3m <chr> <dbl> 1 a 7.00 2 b 11.9 3 c 18.1 |
太好了!我的问题特定于当此列的名称不是" value "时,使函数中的\\'value \\'动态化。我天真的尝试使用sym()传递了此列,并且它的错误如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | agg_by_period2 <- function(df, date_period, period, grouping, new_col_prefix, value_var){ grouping_vars <- syms(grouping) new_sum_column = quo_name(paste0(new_col_prefix,"sum_", period, 'm')) value_var_col <- sym(value_var) df %>% group_by(!!!grouping_vars) %>% summarize(!!new_sum_column := sum(!!value_var_col[date >= date_period], na.rm = T)) %>% select(!!!grouping_vars, !!sym(new_sum_column)) } agg_by_period2(df = dummy_df, date_period = as.Date('2018-10-31'), grouping = 'id', period = 3, new_col_prefix = 'new_', value_var = 'value' ) Error in `>=.default`(date, date_period) : comparison (5) is possible only for atomic and list types |
当删除日期条件([date> = date_period])时,上述功能将起作用。任何帮助将不胜感激。
这似乎是
1 2 3 4 | df %>% group_by(!!!grouping_vars) %>% summarize(!!new_sum_column := sum((!!value_var_col)[date >= date_period], na.rm = T)) %>% select(!!!grouping_vars, !!sym(new_sum_column)) |
注意