Hadoop: Data is uploaded serially!!!

Arpit Bisane
3 min readOct 18, 2020

There is a myth in the market that in Hadoop cluster when client uploads the data, data is uploaded parallely to the Datanodes.

Children has a thought that an apple fell on Isaac Newton’s head and he was led to state the law of gravity. This is of course a pure foolishness. What Newton discovered was that any two particles in the universe attract each other with a force that is proportional to the product of their masses and inversely proportional to the square of the distance between them. This is not learned from a falling apple but by observing quantities of data and developing mathematical theory that can be verified by additional data. Data gathered by Galieo on falling bodies and by Johannes Kepler on motions of the planet invaluable aids to newton. You all are probably thinking why we are discussing this we already know but this example alone told us the difference between facts and myth .If Newton didn’t have the right approach then he would never have discovered the law of gravity and the whole world is living in a myth . Likewise, our mentor Vimal Daga sir also believes in the existence of fact, reality ,proofs and about the right approach. Right approach is necessary for breaking the myth that exists in today’s technological world. So he always asked their children to have a deep dive in core concepts that exist in today’s world.

Even google also gives the results that hadoop uses parallelism in uploading the data. We can’t trust any article or blogs without any proofs.

But in reality, Hadoop does not uploads data parallely, It uploads data in series to the datanodes and here we are with proper proofs.

> For this practical , we have 2 datanodes, 1 client and Master as our Hadoop set-up.

> With the help of tcpdump program we are able to read packets, which will help us to track the data.

> Now we will upload the file named `lw.txt` from client to the cluster.

> As soon as we hit enter to upload the file, tcpdump program starts working and it clearly shows the data is uploaded in serial order, we caught a snap which shows the serial order.

> This is how it is proved that Hadoop uploads data in serial manner.

>> THANKS FOR READING<<

--

--