r - Run regression by category, bounded by +/- 10% of the category average -


i have data set multiple categories. i'd run linear regression on each category without having subset data new dfs each category. i've done this:

category = c(rep(c("a","b","c"),100)) x = (rep(1:5,60)) y = rnorm(300)*5 df = data.frame(category,x,y)  models = dlply(df, "category", function(dflm)  lm(y ~ x, data = dflm))  lmcoefs = ldply(models, coef) 

in lmcoefs, have coefficients each category stored.

however, run these regressions within +/- 50% of average of each category. therefore, if average y value category 10, want run regression between y values of 5 , 15 category a. same categories b , c.

is there way without splitting datasets , running individual regressions?

thanks, don

i this, perhaps there shorter way.

the data

category = c(rep(c("a","b","c"),100)) x = (rep(1:5,60)) y = rnorm(300,10,3)  # made these positive values  df = data.frame(category,x,y) 

based on script had

ddply(df, "category", function(d,perc=0.5){   m=mean(d$y)   range.min=m*(1-perc)   range.max=m*(1+perc)   d=d[d$y< range.max & d$y> range.min ,]   coef(lm(y ~ x, data = d)) })  #result category (intercept)            x 1           10.04912 -0.042292670 2        b    10.37061 -0.001489721 3        c    10.04206  0.012238932 

instead of using dlply , ldply, easier straight away ddply.


Comments