Replace values in a dataframe based on lookup table
我在替换数据框中的值时遇到了一些麻烦。 我想替换基于单独表的值。 以下是我正在尝试做的一个例子。
我有一张桌子,每一行都是客户,每一列都是他们购买的动物。 让我们将此数据帧称为
1 2 3 4 5 | > table # P1 P2 P3 # 1 cat lizard parrot # 2 lizard parrot cat # 3 parrot cat lizard |
我还有一个要引用的表,称为
1 2 3 4 5 | > lookUp # pet class # 1 cat mammal # 2 lizard reptile # 3 parrot bird |
我想做的是用一个函数创建一个名为
1 2 3 4 5 6 7 8 9 10 | new <- as.data.frame(lapply(table, function(x) { gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE) Warning messages: 1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used 2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used 3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used |
关于如何进行这项工作的任何想法?
您在问题中发布了一种不错的方法。这是一个简单的方法:
1 2 3 | new <- df # create a copy of df # using lapply, loop over columns and match values to the look up table. store in"new". new[] <- lapply(df, function(x) look$class[match(x, look$pet)]) |
一种更快的替代方法是:
1 2 | new <- df new[] <- look$class[match(unlist(df), look$pet)] |
请注意,在两种情况下,我都使用空括号(
(我在回答中使用的是
另一个选项是
1 2 3 4 5 6 | library(dplyr) library(tidyr) table %>% gather(key ="pet") %>% left_join(lookup, by ="pet") %>% spread(key = pet, value = class) |
每当您有两个单独的
每个人在R中都有自己喜欢的合并方法。我的是
另外,由于您想对许多列执行此操作,因此
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | library(data.table) #the row names will be our ID variable for melting setDT(table, keep.rownames = TRUE) setDT(lookUp) #now melt, merge, recast # melting (reshape wide to long) table[ , melt(.SD, id.vars = 'rn') # merging ][lookup, new_value := i.class, on = c(value = 'pet') #reform back to original shape ][ , dcast(.SD, rn ~ variable, value.var = 'new_value')] # rn P1 P2 P3 # 1: 1 mammal reptile bird # 2: 2 reptile bird mammal # 3: 3 bird mammal reptile |
如果您发现
1 2 3 4 5 6 7 | setDT(table) #don't need row names this time setDT(lookUp) sapply(names(table), #(or to whichever are the relevant columns) function(cc) table[lookUp, (cc) := #merge, replace #need to pass a _named_ vector to 'on', so use setNames i.class, on = setNames("pet", cc)]) |
创建一个命名向量,并遍历每列并进行匹配,请参见:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # make lookup vector with names lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet) lookUp1 # cat lizard parrot #"mammal""reptile" "bird" # match on names get values from lookup vector res <- data.frame(lapply(df1, function(i) lookUp1[i])) # reset rownames rownames(res) <- NULL # res # P1 P2 P3 # 1 mammal reptile bird # 2 reptile bird mammal # 3 bird mammal reptile |
数据
1 2 3 4 5 6 7 8 9 10 11 | df1 <- read.table(text =" P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE) lookUp <- read.table(text =" pet class 1 cat mammal 2 lizard reptile 3 parrot bird", header = TRUE) |
我尝试了其他方法,但是使用我的大型数据集花费了很长时间。我改用以下内容:
1 2 3 4 | # make table"new" using ifelse. See data below to avoid re-typing it new <- ifelse(table1 =="cat","mammal", ifelse(table1 =="lizard","reptile", ifelse(table1 =="parrot","bird", NA))) |
此方法要求您为代码编写更多文本,但是
数据
1 2 3 4 5 6 | # create the data table table1 <- read.table(text =" P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE) |
上面显示了如何在dplyr中执行此操作的答案未回答问题,该表充满了NA。这行得通,我希望任何评论显示出一种更好的方式:
1 2 3 4 5 6 7 8 9 10 11 12 | # Add a customer column so that we can put things back in the right order table$customer = seq(nrow(table)) classTable <- table %>% # put in long format, naming column filled with P1, P2, P3"petCount" gather(key="petCount", value="pet", -customer) %>% # add a new column based on the pet's class in data frame"lookup" left_join(lookup, by="pet") %>% # since you wanted to replace the values in"table" with their #"class", remove the pet column select(-pet) %>% # put data back into wide format spread(key="petCount", value="class") |
请注意,保留包含客户,宠物,宠物的种类(?)及其类别的长桌可能会很有用。此示例仅将中间保存添加到变量:
1 2 3 4 5 6 7 8 | table$customer = seq(nrow(table)) petClasses <- table %>% gather(key="petCount", value="pet", -customer) %>% left_join(lookup, by="pet") custPetClasses <- petClasses %>% select(-pet) %>% spread(key="petCount", value="class") |