python - Multi-indexing - accessing the last time in every day -


new multiindexing in pandas. have data looks this

date        time      value 2014-01-14  12:00:04   .424             12:01:12   .342             12:01:19   .341             ...             12:05:49   .23 2014-05-12  ...             1:02:42    .23 .... 

for now, want access last time every single date , store value in array. i've made multiindex this

df= pd.read_csv("df.csv",index_col=0) df.index = pd.to_datetime(df.index,infer_datetime_format=true) df.index =        pd.multiindex.from_arrays([df.index.date,df.index.time],names=['date','time'])  df= df[~df.index.duplicated(keep='first')] dates = df.index.get_level_values(0) 

so have dates saved array. want iterate through dates can't either syntax right or accessing values incorrectly. i've tried loop can't run (for date in dates) , can't direct access either (df.loc[dates[i]] or that). number of time variables in each date varies. there way fix this?

this sounds groupby/max operation. more specifically, want group date , aggregate times taking max. since aggregation can done on column values, we'll need change time index level column (by using reset_index):

import pandas pd  df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df = df.set_index(['date', 'time'])  df = df.reset_index('time', drop=false) max_times = df.groupby(level=0)['time'].max() print(max_times) 

yields

date 2014-01-14    12:05:49 2014-05-12     1:02:42 name: time, dtype: object 

if wish to select entire row, use idxmax -- there caveat. idxmax returns index labels. therefore, index must unique labels signify unique rows. since date level not unique, use idxmax we'll need reset_index (to make index of unique integers):

df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '1:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df['time'] = pd.to_timedelta(df['time']) df = df.set_index(['date', 'time'])  df = df.reset_index() idx = df.groupby(['date'])['time'].idxmax() print(df.loc[idx]) 

yields

        date     time  value 3 2014-01-14 12:05:49   0.23 5 2014-05-12 01:02:42   0.23 

i don't see way while keeping multiindex. easier perform groupby operation before setting multiindex. moreover, preferable preserve datetimes 1 value instead of splitting 2 parts. note given datetime/period-like series, .dt accessor gives easy access date , time needed. can group date without making date column:

df = pd.dataframe({'datetime': ['2014-01-14 12:00:04', '2014-01-14 12:01:12', '2014-01-14 12:01:19', '2014-01-14 12:05:49', '2014-05-12 01:01:59', '2014-05-12 01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['datetime'] = pd.to_datetime(df['datetime']) # df = pd.read_csv('df.csv', parse_dates=[0])  idx = df.groupby(df['datetime'].dt.date)['datetime'].idxmax() result = df.loc[idx] print(result) 

yields

             datetime  value 3 2014-01-14 12:05:49   0.23 5 2014-05-12 01:02:42   0.23 

Comments