Changing The Output File Prefix Of Hadoop MapReduce Job - Big Data In Real World

Changing The Output File Prefix Of Hadoop MapReduce Job

Missing Artifact JDK Tools Jar
June 23, 2016
Is Hadoop Administration right for me?
September 24, 2016

Changing The Output File Prefix Of Hadoop MapReduce Job

Your Hadoop job can have multiple reducers and each reducer will create a file by default with prefix part-r-xxxxx. The first reducer will create a file as part-r-00000 and second reducer will create a file as part-r-00001

What if you don’t like the default prefix “part” and would like to change the prefix?

Changing The Output File Prefix

We are in luck, Hadoop has mapreduce.output.basename property which we can use to set out the prefix of the output. You can use the property like below while executing the MapReduce job.

hadoop jar ~/MaxClosePriceTool-1.0-ARG.jar com.hirw.maxcloseprice.MaxClosePriceTool -D mapreduce.output.basename=custombase /user/hirw/input/stocks output/mapreduce/stocks

For the above command to work, your driver program needs to implement the Tool interface. Here is a sample.

public class MaxClosePriceTool extends Configured implements Tool {

@Override
public int run(String[] args) throws Exception {

.....

}

 }

Here is the output –

Changing Output Filename Prefix Of MapReduce Job

 

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X