What is Speculative Execution?

Sometimes you will notice that a Job which has 3 input splits executed 4 mappers and killed the 4th mapper. The job would still complete successfully but ever wondered why did it execute the 4th Mapper when there are only 3 input splits and kill it at the end ?

What you see is called Speculative Execution.

When Hadoop framework feels that a certain task (Mapper or Reducer) is taking longer on average compared to the other tasks from the same job, it clones the “long running” task and run it on another node. This is called Speculative Execution. Meaning Hadoop is speculating that something is wrong with the “long running” task and runs a clone task on the other node. The slowness in the “long running” job could be due to a faulty hardware, network congestion, or the node could be simply busy etc. Most of the the time this is a false alarm and the task which was considered long running or problematic completes successfully. In that case Hadoop will kill the cloned task and proceed with the results from the completed task.

Is Speculative Execution Always Beneficial ?

In some cases this is beneficial because in a cluster with 100s of nodes problems like hardware failure or network congestion is common and prematurely running a parallel or duplicate task would be better since we won’t be waiting for the task in problem to be complete.

But in some cases it is probably expected that certain maps or reduce may run a little longer when compared to others so in such instance it is not always advisable to speculatively execute tasks as it would unnecessarily take up cluster resources.

How Can I Enable/Disable Speculative Execution?

You can enable and disable both map and reduce side Speculative Execution using the properties – mapreduce.map.speculative and mapreduce.reduce.speculative

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Speculative Execution

Changing Number Of Reducers

Writing A File To HDFS – Java Program

Changing Number Of Reducers

Writing A File To HDFS – Java Program

What is Speculative Execution?

Is Speculative Execution Always Beneficial ?

How Can I Enable/Disable Speculative Execution?

Big Data In Real World

Related posts

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Hadoop In Real World is changing to Big Data In Real World