关于r：根据查找表替换数据框中的值

Replace values in a dataframe based on lookup table

我在替换数据框中的值时遇到了一些麻烦。我想替换基于单独表的值。以下是我正在尝试做的一个例子。

我有一张桌子，每一行都是客户，每一列都是他们购买的动物。让我们将此数据帧称为table。

1
2
3
4
5

> table
# P1 P2 P3
# 1 cat lizard parrot
# 2 lizard parrot cat
# 3 parrot cat lizard

我还有一个要引用的表，称为lookUp。

1
2
3
4
5

> lookUp
# pet class
# 1 cat mammal
# 2 lizard reptile
# 3 parrot bird

我想做的是用一个函数创建一个名为new的新表，用lookUp中的class列替换table中的所有值。我自己使用lapply函数尝试了此操作，但收到以下警告。

1
2
3
4
5
6
7
8
9
10

new <- as.data.frame(lapply(table, function(x) {
gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)

Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used

关于如何进行这项工作的任何想法？

相关讨论

我尝试了其他方法，但是使用我的大型数据集花费了很长时间。我改用以下内容：

1
2
3
4

# make table"new" using ifelse. See data below to avoid re-typing it
new <- ifelse(table1 =="cat","mammal",
ifelse(table1 =="lizard","reptile",
ifelse(table1 =="parrot","bird", NA)))

此方法要求您为代码编写更多文本，但是ifelse的向量化使其运行更快。您必须根据数据确定是否要花费更多时间编写代码或等待计算机运行。如果要确保它有效(iflese命令中没有任何错字)，则可以使用apply(new, 2, function(x) mean(is.na(x)))。

数据

1
2
3
4
5
6

# create the data table
table1 <- read.table(text ="
P1 P2 P3
1 cat lizard parrot
2 lizard parrot cat
3 parrot cat lizard", header = TRUE)

上面显示了如何在dplyr中执行此操作的答案未回答问题，该表充满了NA。这行得通，我希望任何评论显示出一种更好的方式：

1
2
3
4
5
6
7
8
9
10
11
12

# Add a customer column so that we can put things back in the right order
table$customer = seq(nrow(table))
classTable <- table %>%
# put in long format, naming column filled with P1, P2, P3"petCount"
gather(key="petCount", value="pet", -customer) %>%
# add a new column based on the pet's class in data frame"lookup"
left_join(lookup, by="pet") %>%
# since you wanted to replace the values in"table" with their
#"class", remove the pet column
select(-pet) %>%
# put data back into wide format
spread(key="petCount", value="class")

请注意，保留包含客户，宠物，宠物的种类(？)及其类别的长桌可能会很有用。此示例仅将中间保存添加到变量：

1
2
3
4
5
6
7
8

table$customer = seq(nrow(table))
petClasses <- table %>%
gather(key="petCount", value="pet", -customer) %>%
left_join(lookup, by="pet")

custPetClasses <- petClasses %>%
select(-pet) %>%
spread(key="petCount", value="class")