python - Replace values in a large list of arrays (performance) -


i have performance problem replacing values of list of arrays using dictionary.

let's dictionary:

# create sample dictionary  keys = [1, 2, 3, 4] values = [5, 6, 7, 8] dictionary = dict(zip(keys, values)) 

and list of arrays:

# import numpy np  # list of arrays listvalues = []  arr1 = np.array([1, 3, 2]) arr2 = np.array([1, 1, 2, 4]) arr3 = np.array([4, 3, 2])  listvalues.append(arr1) listvalues.append(arr2) listvalues.append(arr3)  listvalues >[array([1, 3, 2]), array([1, 1, 2, 4]), array([4, 3, 2])] 

i use following function replace values in nd nummpy array using dictionary:

# replace function  def replace(arr, rep_dict):      rep_keys, rep_vals = np.array(list(zip(*sorted(rep_dict.items()))))     idces = np.digitize(arr, rep_keys, right=true)      return rep_vals[idces] 

this function fast, need iterate on list of arrays apply function each array:

replaced = [] in xrange(len(listvalues)):     replaced.append(replace(listvalues[i], dictionary)) 

this bottleneck of process, needs iterate on thousands of arrays. how achieve same result without using for-loop? important result in same format input (a list of arrays replaced values)

many guys!!

this trick efficiently, using numpy_indexed package. can further simplified if values in 'listvalues' guaranteed present in 'keys'; ill leave exercise reader.

import numpy_indexed npi arr = np.concatenate(listvalues) idx = npi.indices(keys, arr, missing='mask') remap = np.logical_not(idx.mask) arr[remap] = np.array(values)[idx[remap]] replaced = np.array_split(arr, np.cumsum([len(a) in listvalues][:-1])) 

Comments