Hadoop Interview Questions and Answers
Question - 101 : - Where is table data stored in Apache Hive by default?
Answer - 101 : -
By default,the is table data in Apache Hive is stored in: Hdfs://namenode_server/user/hive/warehouse
Question - 102 : - What is the default File format to import data using Apache sqoop?
Answer - 102 : -
There are basically two file formats sqoop allos to import data they are:
- Delimited Text File format
- Sequence File Format
Question - 103 : - Are Multiline Comments supported in Hive? Why?
Answer - 103 : -
No, as of now multiline comments are not supported in Hive, only single-line comments are supported.
Question - 104 : - Explain a metastore in Hive?
Answer - 104 : -
Metastore is used to store the metadata information; it’s also possible to use RDBMS and the open-source ORM layer, converting object Representation into a relational schema. It’s the central repository of Apache Hive metadata. It stores metadata for Hive tables (similar to their schema and location) and partitions in a relational database. It gives the client access to this information by using metastore service API. Disk storage for the Hive metadata is separate from HDFS storage.
Question - 105 : - What applications are supported by Apache Hive?
Answer - 105 : -
The applications that are supported by Apache Hive are,
Question - 106 : - List the actions that happen when a DataNode fails.
Answer - 106 : -
- Both the Jobtracker and the name node detect the failure on which blocks were the DataNode failed.
- On the failed node all the tasks are rescheduled by locating other DataNodes with copies of these blocks
- User’s data will be replicated to another node from namenode to maintain the configured replication factor.
Question - 107 : - What are the Benefits of using zookeeper?
Answer - 107 : -
- Simple distributed coordination process: The coordination process among all nodes in Zookeeper is straightforward.
- Synchronization: Mutual exclusion and co-operation among server processes.
- Ordered Messages: Zookeeper tracks with a number by denoting its order with the stamping of each update; with the help of all this, messages are ordered here.
- Serialization: Encode the data according to specific rules. Ensure your application runs consistently.
- Reliability: The zookeeper is very reliable. In case of an update, it keeps all the data until forwarded.
- Atomicity: Data transfer either succeeds or fails, but no transaction is partial.
Question - 108 : - What is Yarn?
Answer - 108 : -
Yarn stands for Yet Another Resource Negotiator. It is the resource management layer of Hadoop. The Yarn was launched in Hadoop 2.x. Yarn provides many data processing engines like graph processing, batch processing, interactive processing, and stream processing to execute and process data saved in the Hadoop Distributed File System. Yarn also offers job scheduling. It extends the capability of Hadoop to other evolving technologies so that they can take good advantage of HDFS and economic clusters.
Apache Yarn is the data operating method for Hadoop 2.x. It consists of a master daemon known as “Resource Manager,” a slave daemon called node manager, and Application Master.
Question - 109 : - List the YARN components.
Answer - 109 : -
Resource Manager: It runs on a master daemon and controls the resource allocation in the cluster.
Node Manager: It runs on the slave daemons and executes a task on each single Data Node.
Application Master: It controls the user job lifecycle and resource demands of single applications. It works with the Node Manager and monitors the execution of tasks.
Container: It is a combination of resources, including RAM, CPU, Network, HDD, etc., on a single node.
Question - 110 : - What is an Apache Hive?
Answer - 110 : -
Hive is an open-source system that processes structured data in Hadoop, living on top of the latter for summing Big Data and facilitating analysis and queries. In addition, hive enables SQL developers to write Hive Query Language statements similar to standard SQL statements for data query and analysis. It is created to make MapReduce programming easier because you don’t know and write lengthy Java code.