• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Hadoop Interview Questions and Answers

Question - 31 : - What are the main components of YARN? Can you explain them?

Answer - 31 : -

The main components of YARN are explained below:

  • Resource Manager: It runs on a master daemon and is responsible for controlling the resource allocation in the concerned cluster.
  • Node Manager: It is responsible for executing a task on every single data node. Node manager also runs on the slave daemons in Hadoop.
  • Application Master: It is an important component of YARN as it controls the user job life cycle and the resource demands of single applications. The application master works with the node manager to monitor the task execution.
  • Container: It is like a combination of the Hadoop resources, which may include RAM, network, CPU, HDD, etc., on one single node.

Question - 32 : - What are the most common input formats in Hadoop?

Answer - 32 : -

There are three most common input formats in Hadoop:

  • Text Input Format: Default input format in Hadoop
  • Key-value Input Format: Used for plain text files where the files are broken into lines
  • Sequence File Input Format: Used for reading files in sequence

Question - 33 : - What are the most common output formats in Hadoop?

Answer - 33 : -

The following are the commonly used output formats in Hadoop:

  • Textoutputformat: TextOutputFormat is by default the output format in Hadoop.
  • Mapfileoutputformat: Mapfileoutputformat writes the output as map files in Hadoop.
  • DBoutputformat: DBoutputformat writes the output in relational databases and Hbase.
  • Sequencefileoutputformat: Sequencefileoutputformat is used in writing sequence files.
  • SequencefileAsBinaryoutputformat: SequencefileAsBinaryoutputformat is used in writing keys to a sequence file in binary format.

Question - 34 : - How to execute a Pig script?

Answer - 34 : -

The three methods listed below enable users to execute a Pig script:

  • Grunt shell
  • Embedded script
  • Script file

Question - 35 : - What is Apache Pig and why is it preferred over MapReduce?

Answer - 35 : -

Apache Pig is a Hadoop-based platform that allows professionals to analyze large sets of data and represent them as data flows. Pig reduces the complexities that are required while writing a program in MapReduce, giving it an edge over MapReduce.

The following are some of the reasons why Pig is preferred over MapReduce:

  • While Pig is a language for high-level data flow, MapReduce is a paradigm for low-level data processing.
  • Without the need to write complex Java code in MapReduce, a similar result can easily be achieved in Pig.
  • Pig approximately reduces the code length by 20 times, reducing the time taken for development by about 16 times than MapReduce.
  • Pig offers built-in functionalities to perform numerous operations, including sorting, filters, joins, ordering, etc., which are extremely difficult to perform in MapReduce.
  • Unlike MapReduce, Pig provides various nested data types such as bags, maps, and tuples.

Question - 36 : - What are the components of the Apache Pig architecture?

Answer - 36 : -

The components of the Apache Pig architecture are as follows:

  • Parser: It is responsible for handling Pig scripts and checking the syntax of the script.
  • Optimizer: Its function is to carry out the logical optimization such as projection pushdown, etc. It is the optimizer that receives the logical plan (DAG).
  • Compiler: It is responsible for the conversion of the logical plan into a series of MapReduce jobs.
  • Execution Engine: In the execution engine, MapReduce jobs get submitted in Hadoop in a sorted manner.
  • Execution Mode: The execution modes in Apache Pig are local, and MapReduce modes and their selection entirely depends on the location where the data is stored and the place where you want to run the Pig script.

Question - 37 : - Mention some commands in YARN to check application status and to kill an application.

Answer - 37 : -

The YARN commands are mentioned below as per their functionalities:

1. yarn application - status ApplicationID
This command allows professionals to check the application status.

2. yarn application - kill ApplicationID
The command mentioned above enables users to kill or terminate a particular application.

Question - 38 : - What are the different components of Hive query processors?

Answer - 38 : -

There are numerous components that are used in Hive query processors. They are mentioned below:

  • User-defined functions
  • Semantic analyzer
  • Optimizer
  • Physical plan generation
  • Logical plan generation
  • Type checking
  • Execution engine
  • Parser
  • Operators

Question - 39 : - What are the commands to restart NameNode and all the daemons in Hadoop?

Answer - 39 : -

The following commands can be used to restart NameNode and all the daemons:

NameNode can be stopped with the ./sbin /Hadoop-daemon.sh stop NameNode command. The NameNode can be started by using the ./sbin/Hadoop-daemon.sh start NameNode command.
The daemons can be stopped with the ./sbin /stop-all.sh The daemons can be started by using the ./sbin/start-all.sh command.

Question - 40 : - Define DataNode. How does NameNode tackle DataNode failures?

Answer - 40 : -

DataNode stores data in HDFS; it is a node where actual data resides in the file system. Each DataNode sends a heartbeat message to notify that it is alive. If the NameNode does not receive a message from the DataNode for 10 minutes, the NameNode considers the DataNode to be dead or out of place and starts the replication of blocks that were hosted on that DataNode such that they are hosted on some other DataNode. A BlockReport contains a list of all blocks on a DataNode. Now, the system starts to replicate what was stored in the dead DataNode.

The NameNode manages the replication of the data blocks from one DataNode to another. In this process, the replication data gets transferred directly between DataNodes such that the data never passes the NameNode.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners