We hosted this webinar on Saturday, October 28th 2017. In this webinar we discussed the most common memory related issues with MapReduce jobs and how to address them. The participants in the webinar were super engaging and we answered a lot of questions from the participants as well.
We quite often hosts webinars like these and sign up below to get invitations to join one of our webinars.
We started our discussion by talking about what is a container and touched on the architecture difference between MRv1 and MRv2. We looked why the know how of troubleshooting failures is more important with MRv2
Next, we discussed 2 slightly different errors, that is way too common with MapReduce job execution. Do you see the difference between the below 2 errors? Look at the highlighted text, you will see in the first error the container is killed due to physical memory limit violation. In the second error the container is killed due to virtual memory limit violation.
Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001] is running beyond physical memory limits. Current usage: 569.1 MB of 512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used. Killing container.
Container [pid=791,containerID=container_1499942756442_0001_02_000001] is running beyond virtual memory limits. Current usage: 135.4 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
We discussed what is virtual memory and how it is different from physical memory. We also touched on swapping and aggressive swapping by the operating system. We look at the properties that would affect the physical memory limits for both Mapper and Reducers (mapreduce.map.memory.mb and mapreduce.reduce.memory.mb). Also, we looked at the properties that controls the virtual memory limit (yarn.nodemanager.vmem-check-enabled and yarn.nodemanager.vmem-pmem-ratio)
Below error is very different from the container’s physical and virtual memory limit. Below issue is caused by heap space violation. We looked at the JVM memory structure and explained what is heap and non heap space. Finally we saw how to control the heap space for both mappers and reducers (mapreduce.map.java.opts and mapreduce.reduce.java.opts)
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.OutOfMemoryError: Java heap space at net.ripe.hadoop.pcap.PcapReader.nextPacket(PcapReader.java:208) at net.ripe.hadoop.pcap.PcapReader.access$0(PcapReader.java:173) at
With all the properties in place, we talked about what would stop a bad program or a developer or user requesting ridiculous amount of memory and affect the cluster as a whole. Here are the properties that controls the maximum memory allocation for node manager (yarn.nodemanager.resource.memory-mb) and minimum and maximum amount of memory anyone can request for a container (yarn.scheduler.minimum-allocation-mb and yarn.scheduler.maximum-allocation-mb)
Here is the full recording of the webinar. Enjoy!