Anatomy of HDFS File Read and Write

I am a Tech Enthusiast having 13+ years of experience in ๐๐ as a ๐๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐๐ง๐ญ, ๐๐จ๐ซ๐ฉ๐จ๐ซ๐๐ญ๐ ๐๐ซ๐๐ข๐ง๐๐ซ, ๐๐๐ง๐ญ๐จ๐ซ, with 12+ years in training and mentoring in ๐๐จ๐๐ญ๐ฐ๐๐ซ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐๐ฌ๐ญ ๐๐ฎ๐ญ๐จ๐ฆ๐๐ญ๐ข๐จ๐ง ๐๐ง๐ ๐๐๐ญ๐ ๐๐๐ข๐๐ง๐๐. I have ๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐ 10,000+ ๐ฐ๐ป ๐ท๐๐๐๐๐๐๐๐๐๐๐๐ and ๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐ 500+ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ in the areas of ๐๐จ๐๐ญ๐ฐ๐๐ซ๐ ๐๐๐ฏ๐๐ฅ๐จ๐ฉ๐ฆ๐๐ง๐ญ, ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐ฅ๐จ๐ฎ๐, ๐๐๐ญ๐ ๐๐ง๐๐ฅ๐ฒ๐ฌ๐ข๐ฌ, ๐๐๐ญ๐ ๐๐ข๐ฌ๐ฎ๐๐ฅ๐ข๐ณ๐๐ญ๐ข๐จ๐ง๐ฌ, ๐๐ซ๐ญ๐ข๐๐ข๐๐ข๐๐ฅ ๐๐ง๐ญ๐๐ฅ๐ฅ๐ข๐ ๐๐ง๐๐ ๐๐ง๐ ๐๐๐๐ก๐ข๐ง๐ ๐๐๐๐ซ๐ง๐ข๐ง๐ . I am interested in ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐ ๐๐ฅ๐จ๐ ๐ฌ, ๐ฌ๐ก๐๐ซ๐ข๐ง๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐, ๐ฌ๐จ๐ฅ๐ฏ๐ข๐ง๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐ข๐ฌ๐ฌ๐ฎ๐๐ฌ, ๐ซ๐๐๐๐ข๐ง๐ ๐๐ง๐ ๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ new subjects.
HDFS has a master and slave kind of architecture. Namenode acts as master and Datanodes as worker. All the metadata information is with namenode and the original data is stored on the datanodes. Keeping all these in mind the below figure will give idea about how data flow happens between the Client interacting with HDFS, i.e. the Namenode and the Datanodes.
Anatomy of File Read

The following steps are involved in reading the file from HDFS:
Letโs suppose a Client (a HDFS Client) wants to read a file from HDFS.
Step 1: First the Client will open the file by giving a call to open() method on FileSystem object, which for HDFS is an instance of DistributedFileSystem class.
Step 2: DistributedFileSystem calls the Namenode, using RPC (Remote Procedure Call), to determine the locations of the blocks for the first few blocks of the file. For each block, the NameNode returns the addresses of all the DataNode's that have a copy of that block. Client will interact with respective DataNode'sto read the file. NameNode also provide a token to the client which it shows to data node for authentication.
The DistributedFileSystem returns an object of FSDataInputStream(an input stream that supports file seeks) to the client for it to read data from FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O
Step 3: The client then calls read() on the stream. DFSInputStream, which has stored the DataNode addresses for the first few blocks in the file, then connects to the first closest DataNode for the first block in the file.
Step 4: Data is streamed from the DataNode back to the client, which calls read() repeatedly on the stream.
Step 5: When the end of the block is reached, DFSInputStream will close the connection to the DataNode , then find the best DataNode for the next block. This happens transparently to the client, which from its point of view is just reading a continuous stream.
Step 6: Blocks are read in order, with the DFSInputStream opening new connections to datanodes as the client reads through the stream. It will also call the namnode to retrieve the datanode locations for the next batch of blocks as needed. When the client has finished reading, it calls close() on the FSDataInputStream.
Anatomy of File Write

Step 1: The client creates the file by calling create() method on DistributedFileSystem.
Step 2: DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystemโs namespace, with no blocks associated with it.
The namenode performs various checks to make sure the file doesnโt already exist and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. TheDistributedFileSystem returns an FSDataOutputStream for the client to start writing data to.
Step 3: As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline, and here weโll assume the replication level is three, so there are three nodes in the pipeline. TheDataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.
Step 4: Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.
Step 5: DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
Step 6: When the client has finished writing data, it calls close() on the stream.
This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete The namenode already knows which blocks the file is made up of , so it only has to wait for blocks to be minimally replicated before returning successfully.



