# Hadoop Interview Questions

In this blog post, I will be writing a list of interview questions on Hadoop.

### **What is Hadoop**

When big data emerged as a problem, Hadoop evolved as a Solution to big data.

Apache Hadoop is a framework that provides us with various services or tools, to store and process big data.

It helps in analyzing big data and making business decisions out of it, which cannot be done using traditional systems.

### What are the main components of Hadoop

The main components of Hadoop are  
1\. Storage( Namenode, DataNode)  
2\. Processing framework yarn (resource manager, Node Manager)

### **What are active and passive NameNodes**

In a High availability architecture, there are 2 name nodes. i.e.

1. Active “NameNode” is the “NameNode” which works and runs in the cluster.
    
2. Passive “NameNode” is a standby “NameNode”, which has similar data as the active “NameNode”.
    

When the active “NameNode” fails, the passive “NameNode” replaces the active “NameNode” in the cluster.

### **How HDFS is fault tolerant**

When data is stored over HDFS, NameNode replicates the data in the server DataNode, which has a default value of 3. If any DataNode fails Namenode automatically copies data to another DataNode to make sure data is fault tolerant.

### **Why do we use Hdfs for files with large data sets but not when there are lot of small files?**

NameNode stores the metadata information regarding the file system in the RAM. Therefore, the amount of memory produces a limit to the number of files in my HDFS file system. In other words, too many files will lead to the generation of too much metadata. And, storing these metadata in the RAM will become a challenge. hence HDFS only works with large datasets instead large no.of small.

### How do you define block, and what is the default block size?

Blocks are nothing but the smallest continuous locations in hard drive where data is stored. default block size of Hadoop 1 is 64 MB and Hadoop 2 is 128 MB.

### How does NameNode tackle data node failures?

NameNode periodically receives a Heartbeat (signal) from each of the DataNode in the cluster, which implies DataNode is functioning properly.

### What is a checkpoint?

Checkpointing” is a process that takes an FsImage, edit log and compacts them into a new FsImage. Thus, instead of replaying an edit log, the NameNode can load the final in-memory state directly from the FsImage.

### Name three modes which Hadoop can run?

1. Standalone mode
    
2. Pseudo- distribution mode
    
3. Fully distributed mode