python - How to check for and discard invalid multi-line JSON log requests in log files? -


i'm writing script parse of our requests, , need able handle when have malformed or incomplete requests. example, typical request come in following format:

log-prefix: {json request data}\n

all on single line, etc...

then found out have character buffer limit of 1024 in writer, requests spread across many lines, so:

log-prefix: {first line of data log-prefix: second line of requests data log-prefix: final line of log data}\n

i'm able handle calling next on iterator i'm using, , removing prefix, concatenating requests, , passing json.loads return dictionary need writing file.

i'm doing in following way:

lines = (line.strip('\n') line in inf.readlines()) line in lines:     if not line.endswith('}'):         bad_lines = [line]         while not line.endswith('}'):             line = next(lines)             bad_lines.append(line)             form_line = malformed_data_handler(bad_lines)         else:             form_line = parse_out_json(line) 

and functions used in above code are:

def malformed_data_handler(lines: sequence) -> dict:     """     takes n malformed lines of bridge log data (where json response has     been split across n lines, containing prefixes) , correctly     delegates parsing parse_out_json before returning concatenated     result dictionary.      :param lines: iterable malformed lines elements     :return: dictionary ready writing.     """     logger.debug('handling malformed data.')     parsed = ''     logger.debug(lines)     print(lines)     line in lines:         logger.info('{}'.format(line))         parsed += parse_out_malformed(line)     logger.debug(parsed)     return json.loads(parsed, encoding='utf8')   def parse_out_json(line: str) -> dict:     """     parses out json response returned apache bridge logs.  takes     line , removes prefix, returning dictionary.      :param line:     :return:     """     data = slice(line.find('{'), none)     return json.loads(line[data], encoding='utf8')   def parse_out_malformed(line: str) -> str:     prefix = 'bridge-rails: '     data = slice(line.find(prefix), none)     parsed = line[data].replace(prefix, '')     return parsed 

so problem, i've found instances log data can this:

log-prefix: {first line of data .... log-prefix: last line of data (no closing brace) log-prefix: {new request}

my first though handle add sort of check see if '{' in line. since i'm using generator scalability process lines, don't know have found 1 of these requests until have called next , pulled line out of line generator, , @ point can't re-append it, , i'm not sure how efficiently tell process start line , continue normally.


Comments