How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python)) -
i have written 1 code reducer read output mapper. , create new file key name , values corresponding same key stored 1 file.
my code is:
!/usr/bin/env python import sys last_key = none #initialize these variables input_line in sys.stdin: input_line = input_line.strip() data = input_line.split("\t") this_key = data[0] if len(data) == 2: value = data[1] else: value = none if last_key == this_key: if value: fp.write('{0}\n'.format(value)) else: if last_key: fp.close() fp = open('%s.txt' %this_key,'a') if value: fp.write('{0}\n'.format(value)) if not last_key: fp = open('%s.txt' %this_key,'a') if value: fp.write('{0}\n'.format(value)) last_key = this_key
but not creating file.
so, question function should need use create new files hdfs.
there no straightforward solution achieve this.you may follow below approaches achieve using mapreduce:
approach 1: using partitioner
- find out unique number of files.e.g. count unique number of '%this_key%' in file.
- set number of reducer previous step result in mapreduce driver [each file per reducer].
- use partitioner send map-output particular reducer.
- reducer emit %value%.
- at end of job have same key value per file , might rename reducer output files.
approach 2: if number of files less use multipleoutputs .
Comments
Post a Comment