BLOG – Page 2 – Hadoop In Real World

BLOG

February 6, 2017

HDFS – Why another file system?

In Understanding Big Data Problem post we saw that HDFS or Hadoop Distributed filesystem takes care of all the storage related complexities in Hadoop. In this […]
February 2, 2017

Finding the MAX tuple with Pig

Finding the MAX tuple with Pig Here is a sample dataset. Our goal is to find the record with maximum record_value which is [crayon-58ac8c8bef00e627586809-i/]  [crayon-58ac8c8bef01c847205192/] Script […]
January 30, 2017

How to find directories in HDFS which are older than N days?

How to find directories in HDFS which are older than N days? Cleaning up older or obsolete files in HDFS is important. Even if you have […]
January 26, 2017

How to use multi character delimiter in a Hive table?

How to use multi character delimiter in a Hive table? Sometimes your data is slightly complex to delimit the individual columns with a single character like […]