There are 0 datanode(s) running and no node(s) – Hadoop In Real World

There are 0 datanode(s) running and no node(s)

Hadoop Administrator In Real World - Course Coverage
Hadoop Administrator In Real World – Course Coverage
October 9, 2016
Hadoop safemode recovery – taking too long!
January 12, 2017

There are 0 datanode(s) running and no node(s)

You are trying to write a file to HDFS and this is what you see in your datanode logs. The error suggest that /user/ubuntu/test-dataset can not be replicated to any nodes in the cluster. This error usually means no datanodes are connected to the namenode. In other words, with out datanodes HDFS is not functional.

If you dig through the datanode logs, you will see errors when datanode attempted to connect to the namenode. Here in the log the datanode is trying to connect to 192.168.10.12:9000.

Check configuration

Here is the core-site.xml configuration file for your set up. fs.default.name  points to where namenode is running. In this setup, namenode is running at 192.168.10.12 and listening to port 9000. So from the datanode logs above, datanode is attempting to connect to the correct node. This means there is an issue with the namenode.

Check running process & port

We know that namenode should run and listen on port 9000 based on the above configuration file. Now in the namenode, let’s see where we have any process running on port 9000. Here is the output of the netstat. (you would see a lot of lines, look closer for port 9000).

From below we see a java process running but it is listening on 127.0.0.1:9000 and not on 192.168.10.12:9000. That is our issue. Since namenode is not listening on 192.168.10.12:9000, we get connection timeout.

sudo netstat -ntlp  on the master , it shows:

Check how NameNode is resolved

/ect/hosts file in your system maps ip address with the machine name and here is what we have in the file currently –

Since we are using IP address in the configuration file, IP address 192.168.10.12 will be translated to host name localhost and localhost will then resolve to 127.0.0.1 this will cause the namenode to start and listen on 127.0.0.1:9000. Due to this reason datanode will not be able to reach namenode. An easy fix is to modify the /etc/hosts file like below. This way 192.168.10.12 will resolve to “namenode” instead of localhost as the top entry in the file takes precedence.

After the above change to the /etc/hosts file, restart the namenode and here is the result of netstat after the restart. Now we can the namenode process is running on 192.168.10.12:9000.

Once you confirm that namenode is running on 192.168.10.12:9000 , restart all your datanodes and this time datanodes should be able to connect to namenode with no issues.

Hadoop Team
Hadoop Team
We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. Our collective experience ranges from finance, retail, social media and gaming. We have worked with Hadoop clusters ranging from 50 all the way to 800 nodes.

Comments are closed.