i have data frame looks this:
x y value weight 10 1 red 1 3.4 5 blue 2 10 10 blue 0.5 3 8 blue 0.5 3 8 red 4 10 1 blue 1 3 8 blue 2 3 8 blue 0.25
i want recast each row unique combination of "x" , "y", , columns sums of "weight"s each different value of "value". if possible, i'd also have columns raw counts of values of "value". these data be:
x y red_count blue_count red_sum_of_weights blue_sum_of_weights 10 1 1 1 1 1 3.4 5 0 1 0 2 10 10 0 0 1 0.5 3 8 1 3 4 2.75
is there way reshape or reshape2? can counts of values with
dcast(data,x+y~value)
but haven't been able figure out how use weights in way want to. need work arbitrary number of possible levels value , arbitrary different number of rows per x*y combination in original dataset. i've written own code using loops this, takes extremely long time run - far has taken 6 hours first 15% of 600k row dataset, isn't practical! i'm sure there must way existing functions?
thanks in advance help!
you can combination of dplyr
, reshape2
functions. first, group x
, y
, value
(we change name of latter color
avoid duplicate column names after melting) , calculate count , sum each sub-group. melt
the result put newly calculated summaries in "long" format. finally, dcast
"wide" format asked for.
library(reshape2) library(dplyr) df %>% group_by(x,y,color=value) %>% summarise(count=n(), sum=sum(weight)) %>% melt(id.var=c("x","y","color")) %>% dcast(x + y ~ variable + color)
x y count_blue count_red sum_blue sum_red 1 3.0 8 3 1 2.75 4 2 3.4 5 1 na 2.00 na 3 10.0 1 1 1 1.00 1 4 10.0 10 1 na 0.50 na
Comments
Post a Comment