Question - What is speculative execution in Hadoop?
Answer -
One limitation of Hadoop is that by distributing the tasks on several nodes, there are chances that a few slow nodes limit the rest of the program. There are various reasons for the tasks to be slow, which are sometimes not easy to detect. Instead of identifying and fixing the slow-running tasks, Hadoop tries to detect when the task runs slower than expected and then launches other equivalent tasks as a backup. This backup mechanism in Hadoop is speculative execution.
Speculative execution creates a duplicate task on another disk. The same input can be processed multiple times in parallel. When most tasks in a job come to completion, the speculative execution mechanism schedules duplicate copies of the remaining tasks, which are slower, across the nodes that are free currently. When these tasks are finished, it is intimated to the JobTracker. If other copies are executing speculatively, then Hadoop notifies the TaskTrackers to quit those tasks and reject their output.
Speculative execution is, by default, true in Hadoop. To disable it, mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options can be set to false.