r - Function to perform complex transformation in a dataset -


i've been trying automatize part of workflow r. periodically have use transformations in datasets working with.

i have created small function uses optional arguments, 1 can transform or part of columns of passed dataframe.

the function looks now:

# function: #   transformdividethousand(dataframe, optional = vectorlistofvariables) # # definition: function applies transformation, dividing variables # 1000. if vector passed applies transformation variables # in dataframe. # # example: df <- transformdividethousand (cases, c("label1","label2")) # # source: http://stackoverflow.com/a/36912017/4417072  transformdividethousand <- function(data_frame, listofvars){     if (missing(listofvars)) {         data_frame[, sapply(data_frame, is.numeric)] =             data_frame[, sapply(data_frame, is.numeric)]/1000     } else {         (i in names(data_frame)) {             if (i %in% listofvars) {                 data_frame[,i] = data_frame[,i]/1000             }         }     }     return(data_frame) } 

ok, face problem have apply complex transformation. time should:

  1. reflect scores stored @ variables (ie, find largest value , subtract other values);
  2. sum 1 resulting score;
  3. square root resulting score;
  4. de-reflect scores (now sum same value subtracted in first step)

all should happen maintaining ability run function in or in part of columns of given dataset.

i found way of creating subset of dataframe largest values @ so small function:

colmax <- function(data) sapply(data, max, na.rm = true) 

but running in sorts of problems while applying in transformdividethousand.

problem

i struggling code, far, trying model problem, reached following point:

transformplusonesqrt <- function(data_frame, listofvars){     if (missing(listofvars)) {          # find largest value         data_frame_max <- data_frame         colmax <- function(data) sapply(data, max)         data_frame_max <- colmax(data_frame_max)          # subtract previous value         data_frame[, sapply(data_frame, is.numeric)] =             data_frame[, sapply(data_frame, is.numeric)] -             data_frame_max[,sapply(data_frame_max, is.numeric)]          # plus 1         data_frame[, sapply(data_frame, is.numeric)] =             data_frame[, sapply(data_frame, is.numeric)] + 1          # sqrt         data_frame[, sapply(data_frame, is.numeric)] =             sqrt(data_frame[, sapply(data_frame, is.numeric)])          # now, dereflect         data_frame[, sapply(data_frame, is.numeric)] =             data_frame[, sapply(data_frame, is.numeric)] +             data_frame_max[,sapply(data_frame_max, is.numeric)]      } else {  ### part untouched         (i in names(data_frame)) {             if (i %in% listofvars) {                 data_frame[,i] = data_frame[,i]/1000             }         }     }     return(data_frame) } 

but not work, getting:

    > teste<- transformplusonesqrt(semdti)  show traceback   rerun debug  error in summary.factor(c(na, na, na, na, na, na, na, na, na, na, na,  :    ‘max’ not meaningful factors 

question

i appreciate pointers how achieve rather complex, multitask transformation in 1 function. not looking code, pointers , suggestions.

thanks.

the problem max() , therefore colmax don't work on data of class factor.

you have 2 choices:

  1. test factor class data (if(class(data_frame[,i]) == "factor")) , convert numeric appropriate

  2. use function takes max of factor variable based on frequency:

    maxtable <- function(invec, mult = false) {  if (!is.factor(invec)) invec <- factor(invec)  <- tabulate(invec)  if (istrue(mult)) {   levels(invec)[a == max(a)]   }  else levels(invec)[which.max(a)] } 

Comments