i've been trying automatize part of workflow r. periodically have use transformations in datasets working with.
i have created small function uses optional arguments, 1 can transform or part of columns of passed dataframe.
the function looks now:
# function: # transformdividethousand(dataframe, optional = vectorlistofvariables) # # definition: function applies transformation, dividing variables # 1000. if vector passed applies transformation variables # in dataframe. # # example: df <- transformdividethousand (cases, c("label1","label2")) # # source: http://stackoverflow.com/a/36912017/4417072 transformdividethousand <- function(data_frame, listofvars){ if (missing(listofvars)) { data_frame[, sapply(data_frame, is.numeric)] = data_frame[, sapply(data_frame, is.numeric)]/1000 } else { (i in names(data_frame)) { if (i %in% listofvars) { data_frame[,i] = data_frame[,i]/1000 } } } return(data_frame) }
ok, face problem have apply complex transformation. time should:
- reflect scores stored @ variables (ie, find largest value , subtract other values);
- sum 1 resulting score;
- square root resulting score;
- de-reflect scores (now sum same value subtracted in first step)
all should happen maintaining ability run function in or in part of columns of given dataset.
i found way of creating subset of dataframe largest values @ so small function:
colmax <- function(data) sapply(data, max, na.rm = true)
but running in sorts of problems while applying in transformdividethousand.
problem
i struggling code, far, trying model problem, reached following point:
transformplusonesqrt <- function(data_frame, listofvars){ if (missing(listofvars)) { # find largest value data_frame_max <- data_frame colmax <- function(data) sapply(data, max) data_frame_max <- colmax(data_frame_max) # subtract previous value data_frame[, sapply(data_frame, is.numeric)] = data_frame[, sapply(data_frame, is.numeric)] - data_frame_max[,sapply(data_frame_max, is.numeric)] # plus 1 data_frame[, sapply(data_frame, is.numeric)] = data_frame[, sapply(data_frame, is.numeric)] + 1 # sqrt data_frame[, sapply(data_frame, is.numeric)] = sqrt(data_frame[, sapply(data_frame, is.numeric)]) # now, dereflect data_frame[, sapply(data_frame, is.numeric)] = data_frame[, sapply(data_frame, is.numeric)] + data_frame_max[,sapply(data_frame_max, is.numeric)] } else { ### part untouched (i in names(data_frame)) { if (i %in% listofvars) { data_frame[,i] = data_frame[,i]/1000 } } } return(data_frame) }
but not work, getting:
> teste<- transformplusonesqrt(semdti) show traceback rerun debug error in summary.factor(c(na, na, na, na, na, na, na, na, na, na, na, : ‘max’ not meaningful factors
question
i appreciate pointers how achieve rather complex, multitask transformation in 1 function. not looking code, pointers , suggestions.
thanks.
the problem max()
, therefore colmax
don't work on data of class factor
.
you have 2 choices:
test factor class data (
if(class(data_frame[,i]) == "factor")
) , convert numeric appropriateuse function takes
max
of factor variable based on frequency:maxtable <- function(invec, mult = false) { if (!is.factor(invec)) invec <- factor(invec) <- tabulate(invec) if (istrue(mult)) { levels(invec)[a == max(a)] } else levels(invec)[which.max(a)] }
Comments
Post a Comment