python - Check if list A contains a prefix of an item in list B -


i have 2 list, can call a , b. need check items in list a , see if items in b starts item a , stop check.

example of content in a:

https://some/path http://another/path http://another.some/path 

example of content in b:

http://another/path http://this/wont/match/anything 

currently i'm doing this:

def check_comps(self, comps):    in self.a:       b in comps:          if b.startswith(a):             return 

is there better way this?

your solution has worst-case o(nm) time complexity, o(n^2) if n ~ m. can reduce o(n log(n)) , o(log(n)). here how.

consider list of words (your comps attrubute) , target (your b)

words = ['abdc', 'abd', 'acb', 'abcabc', 'abc'] target = "abcd" 

observe, sorting list of words in lexicographical order, list of prefixes

prefixes = ['abc', 'abcabc', 'abd', 'abdc', 'acb'] 

it degenerate, because prefixes[0] prefix of prefixes[1], hence starts prefixes[1] starts prefixes[0] well. bit problematic. let's see why. let's use fast (binary) search find proper place of target in prefix list.

import bisect   bisect.bisect(prefixes, target)  #  -> 2 

this because target , prefixes[1] share prefix, target[3] > prefixes[1][3], hence lexicographically should go after. hence, if there prefix of target in prefixes, should left of index 2. obviously, target doesn't start prefixes[1] hence in worst case have search way left find whether there prefix. observe, if transform these prefixes nondegenerate list, possible prefix of target left of position returned bisect.bisect. let's reduce list of prefixes , write helper function check whether there prefix of target.

from functools import reduce   def minimize_prefixes(prefixes):     """     note! `prefixes` must sorted lexicographically !     """     def accum_prefs(prefixes, prefix):         if not prefix.startswith(prefixes[-1]):             return prefixes.append(prefix) or prefixes         return prefixes     prefs_iter = iter(prefixes)     return reduce(accum_prefs, prefs_iter, [next(prefs_iter)]) if prefixes else []   def hasprefix(minimized_prefixes, target):     position = bisect.bisect(minimized_prefixes, target)     return target.startswith(minimized_prefixes[position-1]) if position else false 

now let's see

min_prefixes = minimize_prefixes(prefixes) print(min_prefixes)  # -> ['abc', 'abd', 'acb'] hasprefix(min_prefixes, target)  # -> true 

let's make test must fail:

min_prefs_fail = ["abcde"] hasprefix(min_prefs_fail, target)  # -> false 

this way o(n log(n)) search, asymptotically faster o(n^2) solution. note! can (and should) store minimize_prefixes(sorted(comps)) prefix set attribute in object, making prefix search o(log (n)), more faster have now.


Comments