• +91 9723535972
  • info@interviewmaterial.com

Hadoop Interview Questions and Answers

Hadoop Interview Questions and Answers

Question - 71 : - What are the three modes in which Hadoop can run?

Answer - 71 : -

The three modes in which Hadoop can run are :

  • Standalone mode: This is the default mode. It uses the local FileSystem and a single Java process to run the Hadoop services.
  • Pseudo-distributed mode: This uses a single-node Hadoop deployment to execute all Hadoop services.
  • Fully-distributed mode: This uses separate nodes to run Hadoop master and slave services.

Question - 72 : - What are the differences between regular FileSystem and HDFS?

Answer - 72 : -

  • Regular FileSystem: In regular FileSystem, data is maintained in a single system. If the machine crashes, data recovery is challenging due to low fault tolerance. Seek time is more and hence it takes more time to process the data.
  • HDFS: Data is distributed and maintained on multiple systems. If a DataNode crashes, data can still be recovered from other nodes in the cluster. Time taken to read data is comparatively more, as there is local data read to the disc and coordination of data from multiple systems.

Question - 73 : - What are the two types of metadata that a NameNode server holds?

Answer - 73 : -

The two types of metadata that a NameNode server holds are:

  • Metadata in Disk - This contains the edit log and the FSImage
  • Metadata in RAM - This contains the information about DataNodes

Question - 74 : - How can you restart NameNode and all the daemons in Hadoop?

Answer - 74 : -

The following commands will help you restart NameNode and all the daemons:

You can stop the NameNode with ./sbin /Hadoop-daemon.sh stop NameNode command and then start the NameNode using ./sbin/Hadoop-daemon.sh start NameNode command.

You can stop all the daemons with ./sbin /stop-all.sh command and then start the daemons using the ./sbin/start-all.sh command.

Question - 75 : - Which command will help you find the status of blocks and FileSystem health?

Answer - 75 : -

To check the status of the blocks, use the command:

hdfs fsck -files -blocks

To check the health status of FileSystem, use the command:

hdfs fsck / -files –blocks –locations > dfs-fsck.log

Question - 76 : - How do you copy data from the local system onto HDFS?

Answer - 76 : -

The following command will copy data from the local file system onto HDFS:

hadoop fs –copyFromLocal [source] [destination]

Example: hadoop fs –copyFromLocal /tmp/data.csv /user/test/data.csv

In the above syntax, the source is the local path and destination is the HDFS path. Copy from the local system using a -f option (flag option), which allows you to write the same file or a new file to HDFS. 

Question - 77 : - When do you use the dfsadmin -refreshNodes and rmadmin -refreshNodes commands?

Answer - 77 : -

The commands below are used to refresh the node information while commissioning, or when the decommissioning of nodes is completed. 

dfsadmin -refreshNodes

This is used to run the HDFS client and it refreshes node configuration for the NameNode. 

rmadmin -refreshNodes

This is used to perform administrative tasks for ResourceManager.

Question - 78 : - Is there any way to change the replication of files on HDFS after they are already written to HDFS?

Answer - 78 : -

Yes, the following are ways to change the replication of files on HDFS:

We can change the dfs.replication value to a particular number in the $HADOOP_HOME/conf/hadoop-site.xml file, which will start replicating to the factor of that number for any new content that comes in.

If you want to change the replication factor for a particular file or directory, use:

$HADOOP_HOME/bin/Hadoop dfs –setrep –w4 /path of the file

Example: $HADOOP_HOME/bin/Hadoop dfs –setrep –w4 /user/temp/test.csv

Question - 79 : - Explain the process of spilling in MapReduce.

Answer - 79 : -

Spilling is a process of copying the data from memory buffer to disk when the buffer usage reaches a specific threshold size. This happens when there is not enough memory to fit all of the mapper output. By default, a background thread starts spilling the content from memory to disk after 80 percent of the buffer size is filled. 

Question - 80 : - How can you set the mappers and reducers for a MapReduce job?

Answer - 80 : -

The number of mappers and reducers can be set in the command line using:

-D mapred.map.tasks=5 –D mapred.reduce.tasks=2

In the code, one can configure JobConf variables:

job.setNumMapTasks(5); // 5 mappers

job.setNumReduceTasks(2); // 2 reducers


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners