How to subtract a median only from integer value
我有此数据集
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | df=structure(list(Dt = structure(1:39, .Label = c("2018-02-20 00:00:00.000", "2018-02-21 00:00:00.000","2018-02-22 00:00:00.000","2018-02-23 00:00:00.000", "2018-02-24 00:00:00.000","2018-02-25 00:00:00.000","2018-02-26 00:00:00.000", "2018-02-27 00:00:00.000","2018-02-28 00:00:00.000","2018-03-01 00:00:00.000", "2018-03-02 00:00:00.000","2018-03-03 00:00:00.000","2018-03-04 00:00:00.000", "2018-03-05 00:00:00.000","2018-03-06 00:00:00.000","2018-03-07 00:00:00.000", "2018-03-08 00:00:00.000","2018-03-09 00:00:00.000","2018-03-10 00:00:00.000", "2018-03-11 00:00:00.000","2018-03-12 00:00:00.000","2018-03-13 00:00:00.000", "2018-03-14 00:00:00.000","2018-03-15 00:00:00.000","2018-03-16 00:00:00.000", "2018-03-17 00:00:00.000","2018-03-18 00:00:00.000","2018-03-19 00:00:00.000", "2018-03-20 00:00:00.000","2018-03-21 00:00:00.000","2018-03-22 00:00:00.000", "2018-03-23 00:00:00.000","2018-03-24 00:00:00.000","2018-03-25 00:00:00.000", "2018-03-26 00:00:00.000","2018-03-27 00:00:00.000","2018-03-28 00:00:00.000", "2018-03-29 00:00:00.000","2018-03-30 00:00:00.000"), class ="factor"), ItemRelation = c(158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L), stuff = c(200L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3600L, 0L, 0L, 0L, 0L, 700L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1000L, 2600L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 700L), num = c(1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)), .Names = c("Dt","ItemRelation", "stuff","num","year","action"), class ="data.frame", row.names = c(NA, -39L)) |
操作列只有两个值0和1。
我必须为1类动作按事物计算中位数,
然后使用零个动作类别的东西进行中值,并使用一个类别前面的最后五个整数值。
我只是拿了最后5个观察值,
有必要采取零行动类别中的最后5个观察值,但只有整数值
在我们的例子中,这是
1 2 3 4 5 | 200 3600 700 1000 2600 |
然后从一个类别的中位数减去零类别的中位数。
在零操作类别中按事物进行观察的次数可以在0到10之间变化。
如果我们有10个零类别的整数,则取最后五个。
如果只有1,2,3,4,5个整数值,
我们减去整数的实数的中位数。
如果我们只有0而没有整数,那么我们只求0。
如何做?
编辑预期输出
1 2 | Dt ItemRelation DocumentNum DocumentYear value 2018-03-30 00:00:00.000 158043 1459 2018 -300 |
1 2 3 4 5 | *-300=(700-median( 200, 3600, 700, 1000, 2600) |
请注意,如果对于1种按类别列出的操作,只有一个值(在我们的示例中为700),我们不计算中位数,则仅使用该值
如果有两个值,我们将按东西计算1个类别的中位数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | df.0 <- df %>% filter(action == 0 & stuff != 0) %>% arrange(Dt) %>% top_n(5) df.1 <- df %>% filter(action==1 & stuff!=0) new.df <- rbind(df.0,df.1) View( df %>% select (everything()) %>% group_by(ItemRelation, num, year) %>% summarise( median.1 = median(stuff[action == 1 & stuff != 0], na.rm = T), median.0 = median(stuff[action == 0 & stuff != 0], na.rm = T) ) %>% mutate( value = median.1 - median.0, DocumentNum = num, DocumentYear = year ) %>% select(ItemRelation, DocumentNum, DocumentYear, value) |
有帮助吗?