Question - What is HDFS and what are its main components?
Answer -
HDFS is a distributed file system that serves as Hadoop's default storage environment. It can run on low-cost commodity hardware, while providing a high degree of fault tolerance. HDFS stores the various types of data in a distributed environment that offers high throughput to applications with large data sets. HDFS is deployed in a primary/secondary architecture, with each cluster supporting the following two primary node types:
- NameNode. A single primary node that manages the file system namespace, regulates client access to files and processes the metadata information for all the data blocks in the HDFS.
- DataNode. A secondary node that manages the storage attached to each node in the cluster. A cluster typically contains many DataNode instances, but there is usually only one DataNode per physical node. Each DataNode serves read and write requests from the file system's clients.