Fixing org.apache.hadoop.security.AccessControlException: Permission denied - Big Data In Real World

Fixing org.apache.hadoop.security.AccessControlException: Permission denied

One Of Several Explanations To “could only be replicated to 0 nodes” Error
February 17, 2014
Input For Page Ranking Using Hadoop
April 4, 2014

Executions in Hadoop use the underlying logged in username to figure out the permissions in the cluster. When running jobs or working with HDFS, the user who started the Hadoop daemons in the cluster won’t have any access issues because the user has all the necessary permissions as it owns the folders in HDFS. We are most likely to hit the below AccessControlException with a user other than the user running the daemons in the cluster and when the permissions are not configured correctly for the user running the jobs or operating HDFS.

AccessControlException

org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="staging":ubuntu:supergroup:rwxr-xr-x
Exception in thread "main" org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="staging":ubuntu:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1459)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:362)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.jerry.WordCount.main(WordCount.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

From the above exception it is easy to see that a job is trying to a create a directory using the username emily.ragland under a directory named staging . staging  folder is owned by user ubuntu  who belongs to group namedsupergroup . Since other users don’t have access to the staging  folder( rwxr-xr-x ) writes under staging  fails for emily.ragland 

mapreduce.jobtracker.staging.root.dir

Lets understand what is the staging directory first. mapreduce.jobtracker.staging.root.dir  property in mapred-site.xml  specifies the location of the staging directory in HDFS. When a job is submitted, staging folder is used to store the files common to the job like the job’s jar. For this discussion, lets assume that user ubuntu  is running all the daemons in the cluster. When mapreduce.jobtracker.staging.root.dir  property is not specified the location of staging folder would be /tmp/hadoop-ubuntu/mapred/staging .

When a user submits a job, a folder named as the username will be created (if not already present) under /tmp/hadoop-ubuntu/mapred/staging. After few job executions the listing of the directory for user ubuntu  will look like this.

ubuntu@ip-172-x-x-x:~$ hadoop fs -ls /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging

Found 6 items

drwx------ - ubuntu supergroup 0 2014-01-23 13:01 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0034
drwx------ - ubuntu supergroup 0 2014-02-01 12:57 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0097
drwx------ - ubuntu supergroup 0 2014-02-01 12:58 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0098
drwx------ - ubuntu supergroup 0 2014-02-08 13:52 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0127
drwx------ - ubuntu supergroup 0 2014-02-08 14:19 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0133
drwx------ - ubuntu supergroup 0 2014-02-08 14:32 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0139

Now going back to the exception we can see when user emily.ragland  tried executing a job, creating a directory under  /tmp/hadoop-ubuntu/mapred/staging  failed because emily.ragland  does not have access to create folders under /tmp/hadoop-ubuntu/mapred/staging 

To keep things clean and for better control lets specify the location of the staging directory by setting the mapreduce.jobtracker.staging.root.dir  property in mapred-site.xml . After the property is set, restart mapred service for the property to take effect.

<property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>/user</value>
</property>

After the property takes effect, when emily.ragland  tries to run a job an attempt is made to create a folder like /user/emily.ragland  in HDFS. The other reason I like to override with /user  is because it is more aligned with the UNIX notion of a home folder. Now when emily.ragland runs a job it would still fail with the below exception. It still does not have access to /user folder and this is expected as we have not done anything to fix the permission issue.

org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="/user":ubuntu:supergroup:rwxr-xr-x

I have seen several suggestions online suggesting to do a chmod  on /user to 777. This is not advisable as doing so will give other users access to delete or modify other users files in HDFS. Instead create a folder named emily.ragland  under /user  using the root user (in our case it isubuntu ) in HDFS. After creating the folder, change the folder permissions to emily.ragland.

ubuntu@ip-172-x-x-x:~$ hadoop fs -mkdir /user/emily.ragland
ubuntu@ip-172-x-x-x:~$ hadoop fs -chown emily.ragland:emily.ragland /user/emily.ragland

Now the permissions are set the job should run under emily.ragland  with out any issues.

ubuntu@ip-172-x-x-x:~$ hadoop fs -ls /user/emily.ragland
Found 3 items

drwx------ - emily.ragland emily.ragland 0 2014-03-03 13:02 /user/emily.ragland/.staging
drwxr-xr-x - emily.ragland emily.ragland 0 2014-03-03 13:02 /user/emily.ragland/RESULTS
-rw-r--r-- 1 emily.ragland emily.ragland 12 2014-02-26 12:49 /user/emily.ragland/emily.log

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Leave a Reply

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X