i have 2 list, can call a
, b
. need check items in list a
, see if items in b
starts item a
, stop check.
example of content in a:
https://some/path http://another/path http://another.some/path
example of content in b:
http://another/path http://this/wont/match/anything
currently i'm doing this:
def check_comps(self, comps): in self.a: b in comps: if b.startswith(a): return
is there better way this?
your solution has worst-case o(nm) time complexity, o(n^2) if n ~ m. can reduce o(n log(n)) , o(log(n)). here how.
consider list of words (your comps
attrubute) , target (your b
)
words = ['abdc', 'abd', 'acb', 'abcabc', 'abc'] target = "abcd"
observe, sorting list of words in lexicographical order, list of prefixes
prefixes = ['abc', 'abcabc', 'abd', 'abdc', 'acb']
it degenerate, because prefixes[0]
prefix of prefixes[1]
, hence starts prefixes[1]
starts prefixes[0]
well. bit problematic. let's see why. let's use fast (binary) search find proper place of target in prefix
list.
import bisect bisect.bisect(prefixes, target) # -> 2
this because target
, prefixes[1]
share prefix, target[3] > prefixes[1][3]
, hence lexicographically should go after. hence, if there prefix of target
in prefixes
, should left of index 2
. obviously, target
doesn't start prefixes[1]
hence in worst case have search way left find whether there prefix. observe, if transform these prefixes
nondegenerate list, possible prefix of target left of position returned bisect.bisect
. let's reduce list of prefixes , write helper function check whether there prefix of target.
from functools import reduce def minimize_prefixes(prefixes): """ note! `prefixes` must sorted lexicographically ! """ def accum_prefs(prefixes, prefix): if not prefix.startswith(prefixes[-1]): return prefixes.append(prefix) or prefixes return prefixes prefs_iter = iter(prefixes) return reduce(accum_prefs, prefs_iter, [next(prefs_iter)]) if prefixes else [] def hasprefix(minimized_prefixes, target): position = bisect.bisect(minimized_prefixes, target) return target.startswith(minimized_prefixes[position-1]) if position else false
now let's see
min_prefixes = minimize_prefixes(prefixes) print(min_prefixes) # -> ['abc', 'abd', 'acb'] hasprefix(min_prefixes, target) # -> true
let's make test must fail:
min_prefs_fail = ["abcde"] hasprefix(min_prefs_fail, target) # -> false
this way o(n log(n)) search, asymptotically faster o(n^2) solution. note! can (and should) store minimize_prefixes(sorted(comps))
prefix set attribute in object, making prefix search o(log (n)), more faster have now.
Comments
Post a Comment