dataframe - Calculating percent of categorical responses (with grouping) in R -


i have following dataframe:

iv      device1     device2    device3 color   same        same       missing color   different   same       missing color   same        unique     missing shape   same        missing    same shape   different   same       different 

explanation: each iv (independent variable) composed of several measurements (the ‘color’ section composed of 3 different measurements, while 'shape' composed of 2).

each data point has 1 of 4 possible categorical values: same/different/unique/missing. 'missing' means there no value measurement in case of device, while other 3 values represent existing result measurement.

question: want calculate each device percent of times has same/different/unique value (thus generating 3 different percentages), out of total number of values iv (not including cases there ‘missing’ value).

for example, device 2 have following percentages:

  • color- 67% same, 0% different, 33% unique.
  • shape- 100% same, 0% different, 0% unique.

thank you!

this not tidy solution, can use until else posts better one:

# replace "missing" nas df[df == "missing"] <- na   # create factor levels df[,-1] <- lapply(df[,-1], function(x) {         factor(x, levels = c('same', 'different', 'unique')) })   # custom function calculate percent of categorical responses custom <- function(x) {         y <- length(na.omit(x))         if(y > 0)                  return(round((table(x)/y)*100))         else                 return(rep(0, 3)) }   library(purrr)  # split dataframe on iv, remove iv column , apply custom function final <- df %>% split(df$iv) %>%      map(., function(x) {       x <- x[, -1]       t(sapply(x, custom))     }) 

output

final list of 2 data frames:

$color         same different unique device1   67        33      0 device2   67         0     33 device3    0         0      0  $shape         same different unique device1   50        50      0 device2  100         0      0 device3   50        50      0 

data

structure(list(iv = structure(c(1l, 1l, 1l, 2l, 2l), .label = c("color",  "shape"), class = "factor"), device1 = structure(c(1l, 2l, 1l,  1l, 2l), .label = c("same", "different", "unique"), class = "factor"),      device2 = structure(c(1l, 1l, 3l, na, 1l), .label = c("same",      "different", "unique"), class = "factor"), device3 = structure(c(na,      na, na, 1l, 2l), .label = c("same", "different", "unique"     ), class = "factor")), .names = c("iv", "device1", "device2",  "device3"), row.names = c(na, -5l), class = "data.frame") 

Comments