python - Pandas cumsum on groupby not behaving as expected -

i have dataframe this:

df = pd.dataframe({'prob':np.random.uniform(0,1,size), 'target':np.random.randint(0,2, size=size),                'pred':np.random.randint(0,2, size=size)})

that want compute cumsum of groupby of qcut:

df['box'] = pd.qcut(df['prob'], 10)

my expectation calculate cumulative function each group, in order, instead calculating sum each element:

df['target_1'] = 1- df['target'] ch_curve = df.groupby('box').target.cumsum()/float(df.target.sum()) nch_curve = df.groupby('box').target_1.cumsum()/float(df.target_1.sum())

with answer

0     0.000000 1     0.018182 2     0.018182 3     0.018182 4     0.000000 5     0.018182 6     0.018182 7     0.018182 8     0.036364 9     0.018182 10    0.000000 11    0.018182 12    0.018182 13    0.036364 14    0.000000 15    0.036364 16    0.036364 17    0.036364 18    0.054545 19    0.000000 20    0.000000 21    0.018182 22    0.018182 23    0.05454

instead of

'(0.0, 0.1)'    0.04 '(0.1, 0.2)'    0.12 #(0.08 + previous 0.04 ) '(0.2, 0.3)'    0.17 #(0.05 + previous 0.12 )

you want calculate percentage each group , then take cumsum.

in original code df.groupby('box').target.cumsum() take cumsum of each group - have 1 element each of elements in grouped dataframe. division broadcast across of these elements.

instead want 1 summary statistic each group , take cumsum across these statistics.

ch_curve = (df.groupby('box').target.sum() / df.target.sum()).cumsum() nch_curve = (df.groupby('box').target_1.sum() / df.target_1.sum()).cumsum()

Trigger

Search This Blog

python - Pandas cumsum on groupby not behaving as expected -

Comments

Post a Comment