r - Label quantile by group with varying group sizes -


within group (the "name" variable), want cut value quartile. , create quartile label column variable "value". since group size varies, quartile range different group changes well. but below code, cut quartile overall value, resulting same quartile range groups.

dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) dt dt.2<-dt%>% group_by(name)%>% mutate(newcol= cut(value,breaks=quantile(value,probs=seq(0,1,0.25),na.rm=true),include.lowest=true)) dt.2 str(dt.2) 

data:

   name value 1         1 2         2 3         3 4         4 5         5 6         6 7         7 8         8 9     b     1 10    b     2 11    b     3 12    b     4 13    c     1 14    c     2 15    c     3 16    c     4 17    c     5 

output above code. update: problem not newcol factor necol has same quartile range across different group. example name b, value 1-4 quartile range has 3-5, derived min(value) max(value) regardless of group.

 name value newcol    <fctr> <int> <fctr> 1           1  [1,2] 2           2  [1,2] 3           3  (2,3] 4           4  (3,5] 5           5  (3,5] 6           6  (5,8] 7           7  (5,8] 8           8  (5,8] 9       b     1  [1,2] 10      b     2  [1,2] 11      b     3  (2,3] 12      b     4  (3,5] 13      c     1  [1,2] 14      c     2  [1,2] 15      c     3  (2,3] 16      c     4  (3,5] 17      c     5  (3,5] 

desired output

   name value newcol/quartile label  1           1  1 2           2  1 3           3  2 4           4  2 5           5  3 6           6  3 7           7  4 8           8  4 9       b     1  1 10      b     2  2 11      b     3  3 12      b     4  4 13      c     1  1 14      c     2  2 15      c     3  3 16      c     4  4 17      c     5  4 

here's way can it, following split-apply-combine framework.

dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5))  split_dt <- lapply(split(dt, dt$name),                     transform,                     quantlabel = as.numeric( cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t)))  dt <- unsplit(split_dt, dt$name)     name value quantlabel 1         1          1 2         2          1 3         3          2 4         4          2 5         5          3 6         6          3 7         7          4 8         8          4 9     b     1          1 10    b     2          2 11    b     3          3 12    b     4          4 13    c     1          1 14    c     2          1 15    c     3          2 16    c     4          3 17    c     5          4 

edit: there's data.table way

following this post, can use data.table package, if performance concern:

library(data.table) dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) dt.t <- as.data.table(dt) dt.t[,quantlabels := as.numeric(cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t)), name]      name value quantlabels  1:        1           1  2:        2           1  3:        3           2  4:        4           2  5:        5           3  6:        6           3  7:        7           4  8:        8           4  9:    b     1           1 10:    b     2           2 11:    b     3           3 12:    b     4           4 13:    c     1           1 14:    c     2           1 15:    c     3           2 16:    c     4           3 17:    c     5           4 

edit: , there's dplyr way

we can follow @akrun's advice , use as.numeric (which we've done other solutions):

dt %>%     group_by(name) %>%     mutate(quantlabel =                 as.numeric(                    cut(value,                         breaks = quantile(value, probs = seq(0,1,.25)),                         include.lowest = t))) 

note if instead wanted labels themselves, use as.character:

dt %>%     group_by(name) %>%     mutate(quantlabel = as.character(cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = t)))  source: local data frame [17 x 3] groups: name [3]       name value quantlabel    <fctr> <int>      <chr> 1           1   [1,2.75] 2           2   [1,2.75] 3           3 (2.75,4.5] 4           4 (2.75,4.5] 5           5 (4.5,6.25] 6           6 (4.5,6.25] 7           7   (6.25,8] 8           8   (6.25,8] 9       b     1   [1,1.75] 10      b     2 (1.75,2.5] 11      b     3 (2.5,3.25] 12      b     4   (3.25,4] 13      c     1      [1,2] 14      c     2      [1,2] 15      c     3      (2,3] 16      c     4      (3,4] 17      c     5      (4,5] 

Comments