i work in healthcare industry , i'm using machine learning algorithms develop model predict when patients not show appointments. i'm trying create new feature sum of each patient's recent consecutive no-shows. i've looked around lot on stackoverflow , other resources, cannot find i'm looking for. example, if patient has no-showed past 2 recent appointments, every row of new feature's column id filled in 2's. if no-showed 3 times, showed recent appointment, new column filled in 0's.
i tried using plyr's ddply cumsum, did not give me results i'm looking for. used:
ddply(a, .(id), transform, consecutivenoshows = cumsum(noshow))
here example data set ('1' signifies no-show):
id noshow 1 1 1 1 1 0 1 0 1 1 2 0 2 1 2 1 3 1 3 0 3 1 3 1 3 1
this desired outcome:
id noshow consecutivenoshows 1 1 2 1 1 2 1 0 2 1 0 2 1 1 2 2 0 0 2 1 0 2 1 0 3 1 1 3 0 1 3 1 1 3 1 1 3 1 1
i'll grateful help. thank you.
the idea sum()
each id
number of noshow
before 0
appears.
library(dplyr) df %>% group_by(id) %>% mutate(consecutivenoshows = sum(!cumsum(noshow == 0) >= 1))
which gives:
#source: local data frame [13 x 3] #groups: id [3] # # id noshow consecutivenoshows # <int> <int> <int> #1 1 1 2 #2 1 1 2 #3 1 0 2 #4 1 0 2 #5 1 1 2 #6 2 0 0 #7 2 1 0 #8 2 1 0 #9 3 1 1 #10 3 0 1 #11 3 1 1 #12 3 1 1 #13 3 1 1
Comments
Post a Comment