• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Hadoop Interview Questions and Answers

Question - 81 : - Can we write the output of MapReduce in different formats?

Answer - 81 : -

Yes. Hadoop supports various input and output File formats, such as:

  • TextOutputFormat - This is the default output format and it writes records as lines of text. 
  • SequenceFileOutputFormat - This is used to write sequence files when the output files need to be fed into another MapReduce job as input files.
  • MapFileOutputFormat - This is used to write the output as map files. 
  • SequenceFileAsBinaryOutputFormat - This is another variant of SequenceFileInputFormat. It writes keys and values to a sequence file in binary format.
  • DBOutputFormat - This is used for writing to relational databases and HBase. This format also sends the reduce output to a SQL table.

Question - 82 : - What are the different components of a Hive architecture?

Answer - 82 : -

The different components of the Hive are:

  • User Interface: This calls the execute interface to the driver and creates a session for the query. Then, it sends the query to the compiler to generate an execution plan for it
  • Metastore: This stores the metadata information and sends it to the compiler for the execution of a query
  • Compiler: This generates the execution plan. It has a DAG of stages, where each stage is either a metadata operation, a map, or reduces a job or operation on HDFS
  • Execution Engine: This acts as a bridge between the Hive and Hadoop to process the query. Execution Engine communicates bidirectionally with Metastore to perform operations, such as create or drop tables. 

Question - 83 : - What is a partition in Hive and why is partitioning required in Hive

Answer - 83 : -

Partition is a process for grouping similar types of data together based on columns or partition keys. Each table can have one or more partition keys to identify a particular partition. 

Partitioning provides granularity in a Hive table. It reduces the query latency by scanning only relevant partitioned data instead of the entire data set. We can partition the transaction data for a bank based on month — January, February, etc. Any operation regarding a particular month, say February, will only have to scan the February partition, rather than the entire table data.

Question - 84 : - What are the components used in Hive query processors?

Answer - 84 : -

The components used in Hive query processors are:

  • Parser
  • Semantic Analyzer
  • Execution Engine
  • User-Defined Functions
  • Logical Plan Generation
  • Physical Plan Generation
  • Optimizer
  • Operators
  • Type checking

Question - 85 : - What are the different ways of executing a Pig script?

Answer - 85 : -

The different ways of executing a Pig script are as follows:

  • Grunt shell
  • Script file
  • Embedded script

Question - 86 : - What are the major components of a Pig execution environment?

Answer - 86 : -

The major components of a Pig execution environment are:

  • Pig Scripts: They are written in Pig Latin using built-in operators and UDFs, and submitted to the execution environment.
  • Parser: Completes type checking and checks the syntax of the script. The output of the parser is a Directed Acyclic Graph (DAG).
  • Optimizer: Performs optimization using merge, transform, split, etc. Optimizer aims to reduce the amount of data in the pipeline.
  • Compiler: Converts the optimized code into MapReduce jobs automatically.
  • Execution Engine: MapReduce jobs are submitted to execution engines to generate the desired results.

Question - 87 : - State the usage of the group, order by, and distinct keywords in Pig scripts.

Answer - 87 : -

The group statement collects various records with the same key and groups the data in one or more relations.

Example: Group_data = GROUP Relation_name BY AGE

The order statement is used to display the contents of relation in sorted order based on one or more fields.

Example: Relation_2 = ORDER Relation_name1 BY (ASC|DSC)

Distinct statement removes duplicate records and is implemented only on entire records, and not on individual records.

Example: Relation_2 = DISTINCT Relation_name1

Question - 88 : - What are the relational operators in Pig?

Answer - 88 : -

The relational operators in Pig are as follows:

COGROUP
It joins two or more tables and then performs GROUP operation on the joined table result.

CROSS
This is used to compute the cross product (cartesian product) of two or more relations.

FOREACH
This will iterate through the tuples of a relation, generating a data transformation.

JOIN
This is used to join two or more tables in a relation.

LIMIT
This will limit the number of output tuples.

SPLIT
This will split the relation into two or more relations.

UNION
It will merge the contents of two relations.

ORDER
This is used to sort a relation based on one or more fields.

Question - 89 : - Write the code needed to open a connection in HBase.

Answer - 89 : -

The following code is used to open a connection in HBase:

Configuration myConf = HBaseConfiguration.create();

HTableInterface usersTable = new HTable(myConf, “users”);

Question - 90 : - What does replication mean in terms of HBase?

Answer - 90 : -

The replication feature in HBase provides a mechanism to copy data between clusters. This feature can be used as a disaster recovery solution that provides high availability for HBase.

The following commands alter the hbase1 table and set the replication_scope to 1. A replication_scope of 0 indicates that the table is not replicated.

disable ‘hbase1’

alter ‘hbase1’, {NAME => ‘family_name’, REPLICATION_SCOPE => ‘1’}

enable ‘hbase1’


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners