关于r:如何将dplyr中的动态列名传递到自定义函数中?

How to pass dynamic column names in dplyr into custom function?

我有一个具有以下结构的数据集:

1
2
3
4
5
6
7
8
Classes ‘tbl_df’ and 'data.frame':  10 obs. of  7 variables:
 $ GdeName  : chr "Aeugst am Albis""Aeugst am Albis""Aeugst am Albis""Aeugst am Albis" ...
 $ Partei   : chr "BDP""CSP""CVP""EDU" ...
 $ Stand1971: num  NA NA 4.91 NA 3.21 ...
 $ Stand1975: num  NA NA 5.389 0.438 4.536 ...
 $ Stand1979: num  NA NA 6.2774 0.0195 3.4355 ...
 $ Stand1983: num  NA NA 4.66 1.41 3.76 ...
 $ Stand1987: num  NA NA 3.48 1.65 5.75 ...

我想提供一个允许计算任何值之间的差的函数,我想像这样使用dplyr s mutate函数来做到这一点:(假设参数fromto是作为参数传递)

1
2
3
4
5
from <-"Stand1971"
to <-"Stand1987"

data %>%
  mutate(diff = from - to)

当然,这是行不通的,因为dplyr使用非标准评估。而且我知道使用mutate_现在可以很好地解决该问题,并且我已经阅读了此小插图,但仍然无法解决。

该怎么办?

这里是数据集的前几行,用于可重复的示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
structure(list(GdeName = c("Aeugst am Albis","Aeugst am Albis",
"Aeugst am Albis","Aeugst am Albis","Aeugst am Albis","Aeugst am Albis",
"Aeugst am Albis","Aeugst am Albis","Aeugst am Albis","Aeugst am Albis"
), Partei = c("BDP","CSP","CVP","EDU","EVP","FDP","FGA",
"FPS","GLP","GPS"), Stand1971 = c(NA, NA, 4.907306434, NA,
3.2109535926, 18.272143463, NA, NA, NA, NA), Stand1975 = c(NA,
NA, 5.389079711, 0.4382328556, 4.5363022622, 18.749259742, NA,
NA, NA, NA), Stand1979 = c(NA, NA, 6.2773722628, 0.0194647202,
3.4355231144, 25.294403893, NA, NA, NA, 2.7055961071), Stand1983 = c(NA,
NA, 4.6609804428, 1.412940467, 3.7563539244, 26.277246489, 0.8529335746,
NA, NA, 2.601878177), Stand1987 = c(NA, NA, 3.4767860929, 1.6535933856,
5.7451770193, 22.146844746, NA, 3.7453183521, NA, 13.702211858
)), .Names = c("GdeName","Partei","Stand1971","Stand1975",
"Stand1979","Stand1983","Stand1987"), class = c("tbl_df","data.frame"
), row.names = c(NA, -10L))


使用最新版本的dplyr(> = 0.7),可以使用rlang !!(bang-bang)运算符。

1
2
3
4
5
6
library(tidyverse)
from <-"Stand1971"
to <-"Stand1987"

data %>%
  mutate(diff=(!!as.name(from))-(!!as.name(to)))

您只需要使用as.name将字符串转换为名称,然后将其插入表达式中即可。不幸的是,我似乎不得不使用更多的括号,但!!运算符似乎处于一种奇怪的操作顺序顺序。

原始答案,dplyr(0.3- <0.7):

在该小插图(vignette("nse","dplyr"))中,使用lazyeval的interp()函数

1
2
3
4
5
6
7
library(lazyeval)

from <-"Stand1971"
to <-"Stand1987"

data %>%
  mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))


您现在可以在dplyr链中使用.data

1
2
3
4
5
library(dplyr)
from <-"Stand1971"
to <-"Stand1987"

data %>% mutate(diff = .data[[from]] - .data[[to]])

另一种选择是将sym与bang-bang(!!)

一起使用

1
data %>% mutate(diff = !!sym(from) - !!sym(to))

在基数R中,我们可以使用:

1
data$diff <- data[[from]] - data[[to]]