Hadoop Administrator In Real World – Course Coverage

October 9, 2016

Hadoop safemode recovery – taking too long!

January 12, 2017

Published by Big Data In Real World at January 9, 2017

There are 0 datanode(s) running and no node(s)

You are trying to write a file to HDFS and this is what you see in your datanode logs. The error suggest that /user/ubuntu/test-dataset can not be replicated to any nodes in the cluster. This error usually means no datanodes are connected to the namenode. In other words, with out datanodes HDFS is not functional.

14/01/02 04:22:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/02 04:22:56 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/test-dataset could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

If you dig through the datanode logs, you will see errors when datanode attempted to connect to the namenode. Here in the log the datanode is trying to connect to 192.168.10.12:9000.

2014-01-13 12:41:02,332 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.

2014-01-13 12:41:02,334 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!

2014-01-13 12:41:03,427 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.10.12:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-01-13 12:41:04,427 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.10.12:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-01-13 12:41:05,428 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.10.12:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

Check configuration

Here is the core-site.xml configuration file for your set up. fs.default.name points to where namenode is running. In this setup, namenode is running at 192.168.10.12 and listening to port 9000. So from the datanode logs above, datanode is attempting to connect to the correct node. This means there is an issue with the namenode.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

   <property>

       <name>fs.default.name</name>

       <value>hdfs://192.168.10.12:9000</value>

   </property>

   <property>

       <name>hadoop.tmp.dir</name>

       <value>/var/hadoop/tmp</value>

   </property>

</configuration>

Check running process & port

We know that namenode should run and listen on port 9000 based on the above configuration file. Now in the namenode, let’s see where we have any process running on port 9000. Here is the output of the netstat. (you would see a lot of lines, look closer for port 9000).

From below we see a java process running but it is listening on 127.0.0.1:9000 and not on 192.168.10.12:9000. That is our issue. Since namenode is not listening on 192.168.10.12:9000, we get connection timeout.

sudo netstat -ntlp on the master , it shows:

tcp6       0      0 127.0.0.1:9000      :::*                    LISTEN      32646/java

Check how NameNode is resolved

/ect/hosts file in your system maps ip address with the machine name and here is what we have in the file currently –

192.168.10.12   localhost

192.168.10.12   namenode

127.0.0.1       localhost

Since we are using IP address in the configuration file, IP address 192.168.10.12 will be translated to host name localhost and localhost will then resolve to 127.0.0.1 this will cause the namenode to start and listen on 127.0.0.1:9000. Due to this reason datanode will not be able to reach namenode. An easy fix is to modify the /etc/hosts file like below. This way 192.168.10.12 will resolve to “namenode” instead of localhost as the top entry in the file takes precedence.

   192.168.10.12   namenode

   192.168.10.12   localhost

   127.0.0.1       localhost

After the above change to the /etc/hosts file, restart the namenode and here is the result of netstat after the restart. Now we can the namenode process is running on 192.168.10.12:9000.

sudo netstat -ntlp
tcp6       0      0 192.168.10.12:9000      :::*                    LISTEN      32646/java

Once you confirm that namenode is running on 192.168.10.12:9000 , restart all your datanodes and this time datanodes should be able to connect to namenode with no issues.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

There are 0 datanode(s) running and no node(s)

Hadoop Administrator In Real World – Course Coverage

Hadoop safemode recovery – taking too long!

Hadoop Administrator In Real World – Course Coverage

Hadoop safemode recovery – taking too long!

There are 0 datanode(s) running and no node(s)

Check configuration

Check running process & port

Check how NameNode is resolved

Big Data In Real World

Related posts

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Hadoop In Real World is changing to Big Data In Real World