i have data set multiple categories. i'd run linear regression on each category without having subset data new dfs each category. i've done this:
category = c(rep(c("a","b","c"),100)) x = (rep(1:5,60)) y = rnorm(300)*5 df = data.frame(category,x,y) models = dlply(df, "category", function(dflm) lm(y ~ x, data = dflm)) lmcoefs = ldply(models, coef)
in lmcoefs, have coefficients each category stored.
however, run these regressions within +/- 50% of average of each category. therefore, if average y value category 10, want run regression between y values of 5 , 15 category a. same categories b , c.
is there way without splitting datasets , running individual regressions?
thanks, don
i this, perhaps there shorter way.
the data
category = c(rep(c("a","b","c"),100)) x = (rep(1:5,60)) y = rnorm(300,10,3) # made these positive values df = data.frame(category,x,y)
based on script had
ddply(df, "category", function(d,perc=0.5){ m=mean(d$y) range.min=m*(1-perc) range.max=m*(1+perc) d=d[d$y< range.max & d$y> range.min ,] coef(lm(y ~ x, data = d)) }) #result category (intercept) x 1 10.04912 -0.042292670 2 b 10.37061 -0.001489721 3 c 10.04206 0.012238932
instead of using dlply , ldply, easier straight away ddply.
Comments
Post a Comment