i have dataframe this:
df = pd.dataframe({'prob':np.random.uniform(0,1,size), 'target':np.random.randint(0,2, size=size), 'pred':np.random.randint(0,2, size=size)})
that want compute cumsum
of groupby
of qcut
:
df['box'] = pd.qcut(df['prob'], 10)
my expectation calculate cumulative function each group, in order, instead calculating sum each element:
df['target_1'] = 1- df['target'] ch_curve = df.groupby('box').target.cumsum()/float(df.target.sum()) nch_curve = df.groupby('box').target_1.cumsum()/float(df.target_1.sum())
with answer
0 0.000000 1 0.018182 2 0.018182 3 0.018182 4 0.000000 5 0.018182 6 0.018182 7 0.018182 8 0.036364 9 0.018182 10 0.000000 11 0.018182 12 0.018182 13 0.036364 14 0.000000 15 0.036364 16 0.036364 17 0.036364 18 0.054545 19 0.000000 20 0.000000 21 0.018182 22 0.018182 23 0.05454
instead of
'(0.0, 0.1)' 0.04 '(0.1, 0.2)' 0.12 #(0.08 + previous 0.04 ) '(0.2, 0.3)' 0.17 #(0.05 + previous 0.12 )
you want calculate percentage each group , then take cumsum.
in original code df.groupby('box').target.cumsum()
take cumsum
of each group - have 1 element each of elements in grouped dataframe. division broadcast across of these elements.
instead want 1 summary statistic each group , take cumsum
across these statistics.
ch_curve = (df.groupby('box').target.sum() / df.target.sum()).cumsum() nch_curve = (df.groupby('box').target_1.sum() / df.target_1.sum()).cumsum()
Comments
Post a Comment