python - Regrouping lines of a text file -


i using python script generate stata commands. output text file. group lines belonging same observation, not case, using python.

a typical line in file (let's call file.txt) of sort:

[something something] if == 1 & b == 2 & c == 3 & [other things] 

where a, b , c identifying variables. (a,b,c) triplets uniquely identifies observation. trying sort file.txt grouping lines related same observation together.

for instance, go from:

replace k = 1 if == 1 & b == 2 & c == 3 & comments_1 == "i wish better @ python"  replace k = 2 if == 1 & b == 3 & c == 4 & comments_1 == ""  replace g = "example" if == 1 & b == 2 & c == 3 & comments_1 == "i wish better @ python" 

to:

replace k = 1 if == 1 & b == 2 & c == 3 & comments_1 == "i wish better @ python"   replace g = "example" if == 1 & b == 2 & c == 3 & comments_1 == "i wish better @ python"   replace k = 2 if == 1 & b == 3 & c == 4 & comments_1 == "" 

the lines 1 , 3 of input next each other in output because relate same observation (the same a, b, c triplet). different sorting alphabetically, cannot use sort().

my plan be:

create empty dictionary dict[tuple[int]:set[str]]

read each line of text file. each line, triplet searching characters after 'a == ' , before ' b ==' , forth.

if triplet in dictionary, add line string in set triplet points to. if not, create entry , add string.

for each string in set of each entry, write in file strings.

this believe sort file.

would work? there better way it?

thanks!

sounds me. use regex extract observations. example, assuming observations made of positive integers use:

import re line = 'replace k = 1 if == 1 & b == 2 & c == 3 & comments_1 == "test"' m = re.search(r'a == (\d+) & b == (\d+) & c == (\d+)', line) observation = tuple(map(int, m.groups())) print(observation) 

this prints tuple (1, 2, 3).


Comments