• +91 9723535972
  • info@interviewmaterial.com

Big Data Interview Questions and Answers

Big Data Interview Questions and Answers

Question - 21 : - What is a block in HDFS and what is its default size in Hadoop 1 and Hadoop 2? Can we change the block size?

Answer - 21 : -

Blocks are smallest continuous data storage in a hard drive. For HDFS, blocks are stored across Hadoop cluster.

  • The default block size in Hadoop 1 is: 64 MB
  • The default block size in Hadoop 2 is: 128 MB

Question - 22 : - What are the configuration parameters in a “MapReduce” program?

Answer - 22 : -

The main configuration parameters in “MapReduce” framework are:

  • Input locations of Jobs in the distributed file system
  • Output location of Jobs in the distributed file system
  • The input format of data
  • The output format of data
  • The class which contains the map function
  • The class which contains the reduce function
  • JAR file which contains the mapper, reducer and the driver classes

Question - 23 : - What are the three running modes of Hadoop?

Answer - 23 : -

The three running modes of Hadoop are as follows:

i. Standalone or local: This is the default mode and does not need any configuration. In this mode, all the following components of Hadoop uses local file system and runs on a single JVM –

  • NameNode
  • DataNode
  • ResourceManager
  • NodeManager
ii. Pseudo-distributed: In this mode, all the master and slave Hadoop services are deployed and executed on a single node.

iii. Fully distributed: In this mode, Hadoop master and slave services are deployed and executed on separate nodes.

Question - 24 : - Explain JobTracker in Hadoop

Answer - 24 : -

JobTracker is a JVM process in Hadoop to submit and track MapReduce jobs.

JobTracker performs the following activities in Hadoop in a sequence –

  • JobTracker receives jobs that a client application submits to the job tracker
  • JobTracker notifies NameNode to determine data node
  • JobTracker allocates TaskTracker nodes based on available slots.
  • it submits the work on allocated TaskTracker Nodes,
  • JobTracker monitors the TaskTracker nodes.
  • When a task fails, JobTracker is notified and decides how to reallocate the task.

Question - 25 : - What are the different configuration files in Hadoop?

Answer - 25 : -

he different configuration files in Hadoop are –

core-site.xml – This configuration file contains Hadoop core configuration settings, for example, I/O settings, very common for MapReduce and HDFS. It uses hostname a port.

mapred-site.xml – This configuration file specifies a framework name for MapReduce by setting mapreduce.framework.name

hdfs-site.xml – This configuration file contains HDFS daemons configuration settings. It also specifies default block permission and replication checking on HDFS.

yarn-site.xml – This configuration file specifies configuration settings for ResourceManager and NodeManager.

Question - 26 : - How can you achieve security in Hadoop?

Answer - 26 : -

Kerberos are used to achieve security in Hadoop. There are 3 steps to access a service while using Kerberos, at a high level. Each step involves a message exchange with a server.

  • Authentication – The first step involves authentication of the client to the authentication server, and then provides a time-stamped TGT (Ticket-Granting Ticket) to the client.
  • Authorization – In this step, the client uses received TGT to request a service ticket from the TGS (Ticket Granting Server).
  • Service Request – It is the final step to achieve security in Hadoop. Then the client uses service ticket to authenticate himself to the server.

Question - 27 : - What is commodity hardware?

Answer - 27 : - Commodity hardware is a low-cost system identified by less-availability and low-quality. The commodity hardware comprises of RAM as it performs a number of services that require RAM for the execution. One doesn’t require high-end hardware configuration or supercomputers to run Hadoop, it can be run on any commodity hardware.

Question - 28 : - How do Hadoop MapReduce works?

Answer - 28 : -

There are two phases of MapReduce operation.

  • Map phase – In this phase, the input data is split by map tasks. The map tasks run in parallel. These split data is used for analysis purpose.
  • Reduce phase- In this phase, the similar split data is aggregated from the entire collection and shows the result.

Question - 29 : - What is MapReduce? What is the syntax you use to run a MapReduce program?

Answer - 29 : -

MapReduce is a programming model in Hadoop for processing large data sets over a cluster of computers, commonly known as HDFS. It is a parallel programming model.

The syntax to run a MapReduce program is – hadoop_jar_file.jar /input_path /output_path.

Question - 30 : - How to restart all the daemons in Hadoop?

Answer - 30 : -

To restart all the daemons, it is required to stop all the daemons first. The Hadoop directory contains sbin directory that stores the script files to stop and start daemons in Hadoop.

Use stop daemons command /sbin/stop-all.sh to stop all the daemons and then use /sin/start-all.sh command to start all the daemons again.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners