i have following dataframe:
iv device1 device2 device3 color same same missing color different same missing color same unique missing shape same missing same shape different same different
explanation: each iv (independent variable) composed of several measurements (the ‘color’ section composed of 3 different measurements, while 'shape' composed of 2).
each data point has 1 of 4 possible categorical values: same/different/unique/missing. 'missing' means there no value measurement in case of device, while other 3 values represent existing result measurement.
question: want calculate each device percent of times has same/different/unique value (thus generating 3 different percentages), out of total number of values iv (not including cases there ‘missing’ value).
for example, device 2 have following percentages:
- color- 67% same, 0% different, 33% unique.
- shape- 100% same, 0% different, 0% unique.
thank you!
this not tidy solution, can use until else posts better one:
# replace "missing" nas df[df == "missing"] <- na # create factor levels df[,-1] <- lapply(df[,-1], function(x) { factor(x, levels = c('same', 'different', 'unique')) }) # custom function calculate percent of categorical responses custom <- function(x) { y <- length(na.omit(x)) if(y > 0) return(round((table(x)/y)*100)) else return(rep(0, 3)) } library(purrr) # split dataframe on iv, remove iv column , apply custom function final <- df %>% split(df$iv) %>% map(., function(x) { x <- x[, -1] t(sapply(x, custom)) })
output
final list of 2 data frames:
$color same different unique device1 67 33 0 device2 67 0 33 device3 0 0 0 $shape same different unique device1 50 50 0 device2 100 0 0 device3 50 50 0
data
structure(list(iv = structure(c(1l, 1l, 1l, 2l, 2l), .label = c("color", "shape"), class = "factor"), device1 = structure(c(1l, 2l, 1l, 1l, 2l), .label = c("same", "different", "unique"), class = "factor"), device2 = structure(c(1l, 1l, 3l, na, 1l), .label = c("same", "different", "unique"), class = "factor"), device3 = structure(c(na, na, na, 1l, 2l), .label = c("same", "different", "unique" ), class = "factor")), .names = c("iv", "device1", "device2", "device3"), row.names = c(na, -5l), class = "data.frame")
Comments
Post a Comment