within group (the "name" variable), want cut value quartile. , create quartile label column variable "value". since group size varies, quartile range different group changes well. but below code, cut quartile overall value, resulting same quartile range groups.
dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) dt dt.2<-dt%>% group_by(name)%>% mutate(newcol= cut(value,breaks=quantile(value,probs=seq(0,1,0.25),na.rm=true),include.lowest=true)) dt.2 str(dt.2) data:
name value 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 b 1 10 b 2 11 b 3 12 b 4 13 c 1 14 c 2 15 c 3 16 c 4 17 c 5 output above code. update: problem not newcol factor necol has same quartile range across different group. example name b, value 1-4 quartile range has 3-5, derived min(value) max(value) regardless of group.
name value newcol <fctr> <int> <fctr> 1 1 [1,2] 2 2 [1,2] 3 3 (2,3] 4 4 (3,5] 5 5 (3,5] 6 6 (5,8] 7 7 (5,8] 8 8 (5,8] 9 b 1 [1,2] 10 b 2 [1,2] 11 b 3 (2,3] 12 b 4 (3,5] 13 c 1 [1,2] 14 c 2 [1,2] 15 c 3 (2,3] 16 c 4 (3,5] 17 c 5 (3,5] desired output
name value newcol/quartile label 1 1 1 2 2 1 3 3 2 4 4 2 5 5 3 6 6 3 7 7 4 8 8 4 9 b 1 1 10 b 2 2 11 b 3 3 12 b 4 4 13 c 1 1 14 c 2 2 15 c 3 3 16 c 4 4 17 c 5 4
here's way can it, following split-apply-combine framework.
dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) split_dt <- lapply(split(dt, dt$name), transform, quantlabel = as.numeric( cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t))) dt <- unsplit(split_dt, dt$name) name value quantlabel 1 1 1 2 2 1 3 3 2 4 4 2 5 5 3 6 6 3 7 7 4 8 8 4 9 b 1 1 10 b 2 2 11 b 3 3 12 b 4 4 13 c 1 1 14 c 2 1 15 c 3 2 16 c 4 3 17 c 5 4 edit: there's data.table way
following this post, can use data.table package, if performance concern:
library(data.table) dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) dt.t <- as.data.table(dt) dt.t[,quantlabels := as.numeric(cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t)), name] name value quantlabels 1: 1 1 2: 2 1 3: 3 2 4: 4 2 5: 5 3 6: 6 3 7: 7 4 8: 8 4 9: b 1 1 10: b 2 2 11: b 3 3 12: b 4 4 13: c 1 1 14: c 2 1 15: c 3 2 16: c 4 3 17: c 5 4 edit: , there's dplyr way
we can follow @akrun's advice , use as.numeric (which we've done other solutions):
dt %>% group_by(name) %>% mutate(quantlabel = as.numeric( cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t))) note if instead wanted labels themselves, use as.character:
dt %>% group_by(name) %>% mutate(quantlabel = as.character(cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t))) source: local data frame [17 x 3] groups: name [3] name value quantlabel <fctr> <int> <chr> 1 1 [1,2.75] 2 2 [1,2.75] 3 3 (2.75,4.5] 4 4 (2.75,4.5] 5 5 (4.5,6.25] 6 6 (4.5,6.25] 7 7 (6.25,8] 8 8 (6.25,8] 9 b 1 [1,1.75] 10 b 2 (1.75,2.5] 11 b 3 (2.5,3.25] 12 b 4 (3.25,4] 13 c 1 [1,2] 14 c 2 [1,2] 15 c 3 (2,3] 16 c 4 (3,4] 17 c 5 (4,5]
Comments
Post a Comment