new multiindexing in pandas. have data looks this
date time value 2014-01-14 12:00:04 .424 12:01:12 .342 12:01:19 .341 ... 12:05:49 .23 2014-05-12 ... 1:02:42 .23 .... for now, want access last time every single date , store value in array. i've made multiindex this
df= pd.read_csv("df.csv",index_col=0) df.index = pd.to_datetime(df.index,infer_datetime_format=true) df.index = pd.multiindex.from_arrays([df.index.date,df.index.time],names=['date','time']) df= df[~df.index.duplicated(keep='first')] dates = df.index.get_level_values(0) so have dates saved array. want iterate through dates can't either syntax right or accessing values incorrectly. i've tried loop can't run (for date in dates) , can't direct access either (df.loc[dates[i]] or that). number of time variables in each date varies. there way fix this?
this sounds groupby/max operation. more specifically, want group date , aggregate times taking max. since aggregation can done on column values, we'll need change time index level column (by using reset_index):
import pandas pd df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df = df.set_index(['date', 'time']) df = df.reset_index('time', drop=false) max_times = df.groupby(level=0)['time'].max() print(max_times) yields
date 2014-01-14 12:05:49 2014-05-12 1:02:42 name: time, dtype: object if wish to select entire row, use idxmax -- there caveat. idxmax returns index labels. therefore, index must unique labels signify unique rows. since date level not unique, use idxmax we'll need reset_index (to make index of unique integers):
df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '1:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df['time'] = pd.to_timedelta(df['time']) df = df.set_index(['date', 'time']) df = df.reset_index() idx = df.groupby(['date'])['time'].idxmax() print(df.loc[idx]) yields
date time value 3 2014-01-14 12:05:49 0.23 5 2014-05-12 01:02:42 0.23 i don't see way while keeping multiindex. easier perform groupby operation before setting multiindex. moreover, preferable preserve datetimes 1 value instead of splitting 2 parts. note given datetime/period-like series, .dt accessor gives easy access date , time needed. can group date without making date column:
df = pd.dataframe({'datetime': ['2014-01-14 12:00:04', '2014-01-14 12:01:12', '2014-01-14 12:01:19', '2014-01-14 12:05:49', '2014-05-12 01:01:59', '2014-05-12 01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['datetime'] = pd.to_datetime(df['datetime']) # df = pd.read_csv('df.csv', parse_dates=[0]) idx = df.groupby(df['datetime'].dt.date)['datetime'].idxmax() result = df.loc[idx] print(result) yields
datetime value 3 2014-01-14 12:05:49 0.23 5 2014-05-12 01:02:42 0.23
Comments
Post a Comment