How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python)) -


i have written 1 code reducer read output mapper. , create new file key name , values corresponding same key stored 1 file.

my code is:

!/usr/bin/env python  import sys  last_key      = none              #initialize these variables  input_line in sys.stdin:      input_line = input_line.strip()     data = input_line.split("\t")      this_key = data[0]     if len(data) == 2:         value = data[1]     else:         value = none     if last_key == this_key:         if value:             fp.write('{0}\n'.format(value))     else:         if last_key:             fp.close()             fp = open('%s.txt' %this_key,'a')             if value:                 fp.write('{0}\n'.format(value))         if not last_key:             fp = open('%s.txt' %this_key,'a')             if value:                 fp.write('{0}\n'.format(value))         last_key = this_key      

but not creating file.

so, question function should need use create new files hdfs.

there no straightforward solution achieve this.you may follow below approaches achieve using mapreduce:

approach 1: using partitioner

  1. find out unique number of files.e.g. count unique number of '%this_key%' in file.
  2. set number of reducer previous step result in mapreduce driver [each file per reducer].
  3. use partitioner send map-output particular reducer.
  4. reducer emit %value%.
  5. at end of job have same key value per file , might rename reducer output files.

approach 2: if number of files less use multipleoutputs .


Comments