• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Hadoop Interview Questions and Answers

Question - 51 : - List the two types of metadata that are stored by the NameNode server

Answer - 51 : -

The NameNode server stores metadata in disk and RAM. The two types of metadata that the NameNode server stores are:

  • EditLogs
  • FsImage

Question - 52 : - Explain the architecture of YARN and how it allocates various resources to applications?

Answer - 52 : -

There is an application, API, or client that communicates with the ResourceManager, which then deals with allocating resources in the cluster. It has an awareness of the resources present with each node manager. There are two internal components of the ResourceManager, application manager and scheduler. The scheduler is responsible for allocating resources to the numerous applications running in parallel based on their requirements. However, the scheduler does not track the application status.

The application manager accepts the submission of jobs and manages and reboots the application master if there is a failure. It manages the applications’ demands for resources and communicates with the scheduler to get the needed resources. It interacts with the NodeManager to manage and execute the tasks that monitor the jobs running. Moreover, it also monitors the resources utilized by each container.

A container consists of a set of resources, including CPU, RAM, and network bandwidth. It allows the applications to use a predefined number of resources.

The ResourceManager sends a request to the NodeManager to keep a few resources to process as soon as there is a job submission. Later, the NodeManager assigns an available container to carry out the processing. The ResourceManager then starts the application master to deal with the execution and it runs in one of the given containers. The rest of the containers available are used for the execution process. This is the overall process of how YARN allocates resources to applications via its architecture.

Question - 53 : - What are the differences between Sqoop and Flume?

Answer - 53 : -

The following are thevarious differences between Sqoop and Flume:

Sqoop

Flume

It works with NoSQL databases and RDBMS for importing and exporting data.

It works with streaming data, which is regularly generated in the Hadoop environment.

In Sqoop, loading data is not event-driven.

In Flume, loading data is event-driven.

It deals with data sources that are structured, and Sqoop connectors help in extracting data from them.

It extracts streaming data from application or web servers.

It takes data from RDBMS, imports it to HDFS, and exports it back to RDBMS.

Data from multiple sources flows into HDFS.

Question - 54 : - Can you name the port numbers for JobTracker, NameNode, and TaskTracker

Answer - 54 : -

JobTracker: The port number for JobTracker is Port 50030

NameNode: The port number for NameNode is Port 50070

TaskTracker: The port number for TaskTracker is Port 50060

Question - 55 : - What are the components of the architecture of Hive?

Answer - 55 : -

User Interface: It requests the execute interface for the driver and also builds a session for this query. Further, the query is sent to the compiler in order to create an execution plan for the same.
Metastore: It stores the metadata and transfers it to the compiler to execute a query.
Compiler: It creates the execution plan. It consists of a DAG of stages wherein each stage can either be a map, metadata operation, or reduce an operation or job on HDFS.
Execution Engine: It bridges the gap between Hadoop and Hive and helps in processing the query. It communicates with the metastore bidirectionally in order to perform various tasks.

Question - 56 : - Why does Hive not store metadata in HDFS?

Answer - 56 : -

Hive stores the data of HDFS and the metadata is stored in the RDBMS or it is locally stored. HDFS does not store this metadata because the read or write operations in HDFS take a lot of time. This is why Hive uses RDBMS to store this metadata in the megastore rather than HDFS. This makes the process faster and enables you to achieve low latency.

Question - 57 : - What are the significant components in the execution environment of Pig?

Answer - 57 : -

The main components of a Pig execution environment are as follows:

  • Pig Scripts: They are written in Pig with the help of UDFs and built-in operators and are then sent to the execution environment.
  • Parser: It checks the script syntax and completes type checking. Parser’s output is a directed acyclic graph (DAG).
  • Optimizer: It conducts optimization with operations such as transform, merges, etc., to minimize the data in the pipeline.
  • Compiler: It automatically converts the code that is optimized into a MapReduce job.
  • Execution Engine: The MapReduce jobs are sent to these engines in order to get the required output.

Question - 58 : - What is the command used to open a connection in HBase?

Answer - 58 : -

The command mentioned below can be used to open a connection in HBase:

Configuration myConf = HBaseConfiguration.create();
HTableInterface usersTable = new HTable(myConf, “users”);

Question - 59 : - What is the use of RecordReader in Hadoop?

Answer - 59 : -

Though InputSplit defines a slice of work, it does not describe how to access it. This is where the RecordReader class comes into the picture; it takes the byte-oriented data from its source and converts it into record-oriented key-value pairs such that it is fit for the Mapper task to read it. Meanwhile, InputFormat defines this Hadoop RecordReader instance.

Question - 60 : - How does Sqoop import or export data between HDFS and RDBMS?

Answer - 60 : -

The steps followed by Sqoop to import and export data, using its architecture, between HDFS and RDBMS are listed below:

  • Search the database to collect metadata.
  • Sqoop splits the input dataset and makes use of the respective map jobs to push these splits to HDFS.
  • Search the database to collect metadata.
  • Sqoop splits the input dataset and makes use of respective map jobs to push these splits to RDBMS. Sqoop exports back the Hadoop files to the RDBMS tables.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners