Troubleshooting Memory Issues with MapReduce Jobs – Hadoop In Real World

Troubleshooting Memory Issues with MapReduce Jobs

What is ZooKeeper and it’s Use Case
October 25, 2017
Spark Execution Engine – Logical Plan to Physical Plan
November 8, 2017

We hosted this webinar on Saturday, October 28th 2017. In this webinar we discussed the most common memory related issues with MapReduce jobs and how to address them. The participants in the webinar were super engaging and we answered a lot of questions from the participants as well.

We quite often hosts webinars like these and sign up below to get invitations to join one of our webinars.

Killing Container – Physical & Virtual Memory Limits

We started our discussion by talking about what is a container and touched on the architecture difference between MRv1 and MRv2. We looked why the know how of troubleshooting failures is more important with MRv2

Next, we discussed 2 slightly different errors, that is way too common with MapReduce job execution. Do you see the difference between the below 2 errors? Look at the highlighted text, you will see in the first error the container is killed due to physical memory limit violation. In the second error the container is killed due to virtual memory limit violation.

Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001] is running beyond physical memory limits. Current usage: 569.1 MB of 512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used. Killing container.

Container [pid=791,containerID=container_1499942756442_0001_02_000001] is running beyond virtual memory limits. Current usage: 135.4 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

We discussed what is virtual memory and how it is different from physical memory. We also touched on swapping and aggressive swapping by the operating system. We look at the properties that would affect the physical memory limits for both Mapper and Reducers (mapreduce.map.memory.mb and mapreduce.reduce.memory.mb). Also, we looked at the properties that controls the virtual memory limit (yarn.nodemanager.vmem-check-enabled and yarn.nodemanager.vmem-pmem-ratio)

java.lang.OutOfMemoryError: Java heap space

Below error is very different from the container’s physical and virtual memory limit. Below issue is caused by heap space violation. We looked at the JVM memory structure and explained what is heap and non heap space. Finally we saw how to control the heap space for both mappers and reducers (mapreduce.map.java.opts and¬†mapreduce.reduce.java.opts)

java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.OutOfMemoryError: Java heap space at net.ripe.hadoop.pcap.PcapReader.nextPacket(PcapReader.java:208) at net.ripe.hadoop.pcap.PcapReader.access$0(PcapReader.java:173) at

Cluster Level Memory Limits

With all the properties in place, we talked about what would stop a bad program or a developer or user requesting ridiculous amount of memory and affect the cluster as a whole. Here are the properties that controls the maximum memory allocation for node manager (yarn.nodemanager.resource.memory-mb) and minimum and maximum amount of memory anyone can request for a container (yarn.scheduler.minimum-allocation-mb and yarn.scheduler.maximum-allocation-mb)

Here is the full recording of the webinar. Enjoy!

Hadoop Team
Hadoop Team
We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. Our collective experience ranges from finance, retail, social media and gaming. We have worked with Hadoop clusters ranging from 100 all the way to over 1000 nodes.

Comments are closed.