i have performance problem replacing values of list of arrays using dictionary.
let's dictionary:
# create sample dictionary keys = [1, 2, 3, 4] values = [5, 6, 7, 8] dictionary = dict(zip(keys, values))
and list of arrays:
# import numpy np # list of arrays listvalues = [] arr1 = np.array([1, 3, 2]) arr2 = np.array([1, 1, 2, 4]) arr3 = np.array([4, 3, 2]) listvalues.append(arr1) listvalues.append(arr2) listvalues.append(arr3) listvalues >[array([1, 3, 2]), array([1, 1, 2, 4]), array([4, 3, 2])]
i use following function replace values in nd nummpy array using dictionary:
# replace function def replace(arr, rep_dict): rep_keys, rep_vals = np.array(list(zip(*sorted(rep_dict.items())))) idces = np.digitize(arr, rep_keys, right=true) return rep_vals[idces]
this function fast, need iterate on list of arrays apply function each array:
replaced = [] in xrange(len(listvalues)): replaced.append(replace(listvalues[i], dictionary))
this bottleneck of process, needs iterate on thousands of arrays. how achieve same result without using for-loop? important result in same format input (a list of arrays replaced values)
many guys!!
this trick efficiently, using numpy_indexed package. can further simplified if values in 'listvalues' guaranteed present in 'keys'; ill leave exercise reader.
import numpy_indexed npi arr = np.concatenate(listvalues) idx = npi.indices(keys, arr, missing='mask') remap = np.logical_not(idx.mask) arr[remap] = np.array(values)[idx[remap]] replaced = np.array_split(arr, np.cumsum([len(a) in listvalues][:-1]))
Comments
Post a Comment