Changing Number Of Mappers - Big Data In Real World

Changing Number Of Mappers

InputSplit vs Block
August 4, 2015
Changing Number Of Reducers
August 11, 2015

Changing Number Of Mappers

Number of mappers always equals to the Number of splits. Having said that it is possible to control the number of splits by changing the mapred.min.split.size  which controls the minimum input split size.

Assume the block size is 64 MB and mapred.min.split.size  is set to 128 MB. Hadoop uses the below formula to calculate the size of the split and it will come up to 128 MB. (lets assume max size is 256 MB)

Input Split Size = max(minimumSize, min(maximumSize, blockSize))

Let’s now substitute the values –

max(128 MB , min(256 MB, 64 MB) = max (128 MB, 64 MB) = 128 MB

The size of InputSplit will be 128 MB even though the block size is 64 MB.

Is There a Benefit In Doing This?

However, there is no real benefit in forcing the split size to be greater than the block size. Doing so will decrease the number of mappers but at the expense of sacrificing data locality because now an InputSplit will comprise data from atleast two blocks and both the blocks may not be available on the same DataNode.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X