• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Hadoop Interview Questions and Answers

Question - 91 : - Can you import/export in an HBase table?

Answer - 91 : -

Yes, it is possible to import and export tables from one HBase cluster to another. 

HBase export utility:
hbase org.apache.hadoop.hbase.mapreduce.Export “table name” “target export location”

Example: hbase org.apache.hadoop.hbase.mapreduce.Export “employee_table” “/export/employee_table”

HBase import utility:
create ‘emp_table_import’, {NAME => ‘myfam’, VERSIONS => 10}

hbase org.apache.hadoop.hbase.mapreduce.Import “table name” “target import location”

Example: create ‘emp_table_import’, {NAME => ‘myfam’, VERSIONS => 10}

hbase org.apache.hadoop.hbase.mapreduce.Import “emp_table_import” “/export/employee_table”

Question - 92 : - How does Bloom filter work?

Answer - 92 : -

The HBase Bloom filter is a mechanism to test whether an HFile contains a specific row or row-col cell. The Bloom filter is named after its creator, Burton Howard Bloom. It is a data structure that predicts whether a given element is a member of a set of data. These filters provide an in-memory index structure that reduces disk reads and determines the probability of finding a row in a particular file.

Question - 93 : - Does HBase have any concept of the namespace?

Answer - 93 : -

A namespace is a logical grouping of tables, analogous to a database in RDBMS. You can create the HBase namespace to the schema of the RDBMS database.

To create a namespace, use the command:

create_namespace ‘namespace name’

To list all the tables that are members of the namespace, use the command: list_namespace_tables ‘default’

To list all the namespaces, use the command:

list_namespace

67. How does the Write Ahead Log (WAL) help when a RegionServer crashes?

If a RegionServer hosting a MemStore crash, the data that existed in memory, but not yet persisted, is lost. HBase recovers against that by writing to the WAL before the write completes. The HBase cluster keeps a WAL to record changes as they happen. If HBase goes down, replaying the WAL will recover data that was not yet flushed from the MemStore to the HFile.

Question - 94 : - Write the HBase command to list the contents and update the column families of a table.

Answer - 94 : -

The following code is used to list the contents of an HBase table:

scan ‘table_name’

Example: scan ‘employee_table’

To update column families in the table, use the following command:

alter ‘table_name’, ‘column_family_name’

Example: alter ‘employee_table’, ‘emp_address’

Question - 95 : - What are catalog tables in HBase?

Answer - 95 : -

The catalog has two tables: hbasemeta and -ROOT-

The catalog table hbase:meta exists as an HBase table and is filtered out of the HBase shell’s list command. It keeps a list of all the regions in the system and the location of hbase:meta is

stored in ZooKeeper. The -ROOT- table keeps track of the location of the .META table.

Question - 96 : - What is hotspotting in HBase and how can it be avoided?

Answer - 96 : -

In HBase, all read and write requests should be uniformly distributed across all of the regions in the RegionServers. Hotspotting occurs when a given region serviced by a single RegionServer receives most or all of the read or write requests.

Hotspotting can be avoided by designing the row key in such a way that data is written should go to multiple regions across the cluster. Below are the techniques to do so:

  • Salting
  • Hashing
  • Reversing the key

Question - 97 : - Mention the consequences of Distributed Applications.

Answer - 97 : -

  • Heterogeneity: The design of applications should allow the users to access services and run applications over a heterogeneous collection of computers and networks taking into consideration Hardware devices, OS, networks, Programming languages.
  • Transparency: Distributed system Designers must hide the complexity of the system as much as they can. Some Terms of transparency are location, access, migration, Relocation, and so on.
  • Openness: It is a characteristic that determines whether the system can be extended and reimplemented in various ways.
  • Security: Distributed system Designers must take care of confidentiality, integrity, and availability.
  • Scalability: A system is said to be scalable if it can handle the addition of users and resources without suffering a noticeable loss of performance.

Question - 98 : - Explain the architecture of Flume.

Answer - 98 : -

In general Apache Flume architecture is composed of the following components:

  • Flume Source
  • Flume Channel
  • Flume Sink
  • Flume Agent
  • Flume Event
               
  • Flume Source: Flume Source is available on various networking platforms like Facebook or Instagram. It is a Data generator that collects data from the generator, and then the data is transferred to a Flume Channel in the form of a Flume.
  • Flume Channel: The data from the flume source is sent to an Intermediate Store which buffers the events till they get transferred into Sink. The Intermediate Store is called Flume Channel. Channel is an intermediate source. It is a bridge between Source and a Sink Flume channel. It supports both the Memory channel and File channel. The file channel is non-volatile which means once the data is entered into the channel, the data will never be lost unless you delete it. In contrast, in the Memory channel, events are stored in memory, so it’s volatile, meaning data may be lost, but Memory Channel is very fast in nature.
  • Flume sink: Data repositories like HDFS, have Flume Sink. Which takes Flume events from the Flume Channel and stores them into the Destination specified like HDFS. It is done in such a way where it should deliver the events to the Store or another agent. Various sinks like Hive Sink, Thrift Sink, etc are supported by the Flume.
  • Flume Agent: A Java process that works on Source, Channel, Sink combination is called Flume Agent. One or more than one agent is possible in Flume. Connected Flume agents which are distributed in nature can also be collectively called Flume.
  • Flume Event: An Event is the unit of data transported in Flume. The general representation of the Data Object in Flume is called Event. The event is made up of a payload of a byte array with optional headers.

Question - 99 : - What is Apache Flume in Hadoop ?

Answer - 99 : -

Apache Flume is a tool/service/data ingestion mechanism for assembling, aggregating, and carrying huge amounts of streaming data such as record files, events from various references to a centralized data store.
Flume is a very stable, distributed, and configurable tool. It is generally designed to copy streaming data (log data) from various web servers to HDFS.

Question - 100 : - If the source data gets updated every now and then, how will you synchronize the data in HDFS that is imported by Sqoop?

Answer - 100 : -

If the source data gets updated in a very short interval of time, the synchronization of data in HDFS that is imported by Sqoop is done with the help of incremental parameters.

We should use incremental import along with the append choice even when the table is refreshed continuously with new rows. Principally where values of a few columns are examined, and if it encounters any revised value for those columns, only a new row will be inserted. Similar to incremental import, the origin has a date column examined for all the records that have been modified after the last import, depending on the previous revised column in the beginning. The values would be modernized.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners