this related previous question realised objective more complicated:
i have sentence: "forbes asia 200 best under 500 billion 2011"
i have tokens like:
oldtokens = [u'forbes', u'asia', u'200', u'best', u'under', u'500', u'billion', u'2011']
and indices of previous parser has figured out there should location or number slots:
numbertokenids = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00} locationtokenids = {(0, 1): u'forbes asia'}
the token ids correspond index of tokens there locations or numbers, objective obtain new set of tokens like:
newtokens = [u'asia', u'200', u'best', u'under', u'500', u'2011']
with new number , location tokenids perhaps (to avoid index out of bounds exceptions):
numbertokenids = {(5,): 2011.0, (1,): 200.0, (4,): 500000000000.00} locationtokenids = {(0,): u'forbes asia'}
essentially go through new reduced set of tokens, , able create new sentence called:
"location_slot number_slot best under number_slot number_slot"
via going through new set of tokens , replacing correct tokenid either "location_slot" or "number_slot". if did current set of number , location token ids, get:
"location_slot location_slot number_slot best under number_slot number_slot number_slot".
how this?
another example is:
location token ids are: (0, 1) number token ids are: (3, 4) old sampletokens [u'united', u'kingdom', u'usd', u'1.240', u'billion']
where want both delete tokens , change location , number token ids able replace sentence like:
sampletokens[numbertokenid] = "number_slot" sampletokens[locationtokenid] = "location_slot"
such replaced tokens [u'location_slot', u'usd', u'number_slot']
not elegant, working solution:
oldtokens = [u'forbes', u'asia', u'200', u'best', u'under', u'500', u'billion', u'2011'] numbertokenids = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00} locationtokenids = {(0, 1): u'forbes asia'} newtokens = [] newnumbertokenids = {} newlocationtokenids = {} new_ind = 0 skip = false ind in range(len(oldtokens)): if skip: skip=false continue loc_ind in locationtokenids.keys(): if ind in loc_ind: newtokens.append(oldtokens[ind+1]) newlocationtokenids[(new_ind,)] = locationtokenids[loc_ind] new_ind += 1 if len(loc_ind) > 1: # skip next position if there 2 elements in tuple skip = true break else: num_ind in numbertokenids.keys(): if ind in num_ind: newtokens.append(oldtokens[ind]) newnumbertokenids[(new_ind,)] = numbertokenids[num_ind] new_ind += 1 if len(num_ind) > 1: skip = true break else: newtokens.append(oldtokens[ind]) new_ind += 1 newtokens out[37]: [u'asia', u'200', u'best', u'under', u'500', u'2011'] newnumbertokenids out[38]: {(1,): 200.0, (4,): 500000000000.0, (5,): 2011.0} newlocationtokenids out[39]: {(0,): u'forbes asia'}
Comments
Post a Comment