Datanode Block Scanner - Big Data In Real World

Datanode Block Scanner

Dealing With Data Corruption In HDFS
September 6, 2015
Pig vs. Hive
October 5, 2015
Dealing With Data Corruption In HDFS
September 6, 2015
Pig vs. Hive
October 5, 2015

Datanode Block Scanner

In this blog post we saw how HDFS handles and corrects data corruption in HDFS using checksum. During a write operation the datanode writing the data to HDFS verifies the checksum for the data that is being written to detect data corruption during transmission. During a read operation the client verifies the checksum that is returned by the datanode against the checksum that it calculates against the data to detect data corruption caused by disk during storage on the datanodes.

These checksum verification are very helpful but they are only done when a client attempts a read (or write) to HDFS. They don’t find corruptions prematurely before a client request a read on a corrupted data.

Every datanode periodically runs a block scanner, which periodically verifies all the blocks that is stored on the datanode. This helps to catch the corrupted block to be identified and fixed before a client request a read operation. With the block scanner service HDFS can prematurely identify and fix corruptions.

How often Block Scanner scans for corrupted blocks?

dfs.datanode.scan.period.hours in hdfs-site.xml  controls how often the block scanner should run and scans for corrupted blocks. We can specify the number of hours which will act as an interval between block scanner runs. By default (in 2.7.0), dfs.datanode.scan.period.hours  is set to 0 which means block scanner is disabled.

Block Scanner report

Every time block scanner runs it produces a report and it can be found at each datanodes’s URL

http://datanode:50075/blockScannerReport

Here is a sample report

Datanode Block Scanner

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Datanode Block Scanner
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X