new multiindexing in pandas. have data looks this
date time value 2014-01-14 12:00:04 .424 12:01:12 .342 12:01:19 .341 ... 12:05:49 .23 2014-05-12 ... 1:02:42 .23 ....
for now, want access last time every single date , store value in array. i've made multiindex this
df= pd.read_csv("df.csv",index_col=0) df.index = pd.to_datetime(df.index,infer_datetime_format=true) df.index = pd.multiindex.from_arrays([df.index.date,df.index.time],names=['date','time']) df= df[~df.index.duplicated(keep='first')] dates = df.index.get_level_values(0)
so have dates saved array. want iterate through dates can't either syntax right or accessing values incorrectly. i've tried loop can't run (for date in dates
) , can't direct access either (df.loc[dates[i]]
or that). number of time variables in each date varies. there way fix this?
this sounds groupby/max
operation. more specifically, want group date
, aggregate time
s taking max
. since aggregation can done on column values, we'll need change time
index level column (by using reset_index
):
import pandas pd df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df = df.set_index(['date', 'time']) df = df.reset_index('time', drop=false) max_times = df.groupby(level=0)['time'].max() print(max_times)
yields
date 2014-01-14 12:05:49 2014-05-12 1:02:42 name: time, dtype: object
if wish to select entire row, use idxmax
-- there caveat. idxmax
returns index labels. therefore, index must unique labels signify unique rows. since date
level not unique, use idxmax
we'll need reset_index
(to make index of unique integers):
df = pd.dataframe({'date': ['2014-01-14', '2014-01-14', '2014-01-14', '2014-01-14', '2014-05-12', '2014-05-12'], 'time': ['12:00:04', '12:01:12', '12:01:19', '12:05:49', '01:01:59', '1:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['date'] = pd.to_datetime(df['date']) df['time'] = pd.to_timedelta(df['time']) df = df.set_index(['date', 'time']) df = df.reset_index() idx = df.groupby(['date'])['time'].idxmax() print(df.loc[idx])
yields
date time value 3 2014-01-14 12:05:49 0.23 5 2014-05-12 01:02:42 0.23
i don't see way while keeping multiindex. easier perform groupby
operation before setting multiindex. moreover, preferable preserve datetimes 1 value instead of splitting 2 parts. note given datetime/period-like series, .dt
accessor gives easy access date
, time
needed. can group date
without making date
column:
df = pd.dataframe({'datetime': ['2014-01-14 12:00:04', '2014-01-14 12:01:12', '2014-01-14 12:01:19', '2014-01-14 12:05:49', '2014-05-12 01:01:59', '2014-05-12 01:02:42'], 'value': [0.42399999999999999, 0.34200000000000003, 0.34100000000000003, 0.23000000000000001, 0.0, 0.23000000000000001]}) df['datetime'] = pd.to_datetime(df['datetime']) # df = pd.read_csv('df.csv', parse_dates=[0]) idx = df.groupby(df['datetime'].dt.date)['datetime'].idxmax() result = df.loc[idx] print(result)
yields
datetime value 3 2014-01-14 12:05:49 0.23 5 2014-05-12 01:02:42 0.23
Comments
Post a Comment