• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Question - Explain the major difference between HDFS block and InputSplit

Answer -

In simple terms, HDFS block is the physical representation of data, while InputSplit is the logical representation of the data present in the block. InputSplit acts as an intermediary between the block and the mapper.

Suppose there are two blocks:

Block 1: ii nntteell

Block 2: Ii ppaatt

Now considering the map, it will read Block 1 from ii to ll but does not know how to process Block 2 at the same time. InputSplit comes into play here, which will form a logical group of Block 1 and Block 2 as a single block.

It then forms a key-value pair using InputFormat and records the reader and sends the map for further processing with InputSplit. If you have limited resources, then you can increase the split size to limit the number of maps. For instance, if there are 10 blocks of 640 MB, 64 MB each, and limited resources, then you can assign the split size as 128 MB. This will form a logical group of 128 MB, with only five maps executing at a time.

However, if the split size property is set to false, then the whole file will form one InputSplit and will be processed by a single map, consuming more time when the file is bigger.

Comment(S)

Show all Coment

Leave a Comment




NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners