Hadoop Archives - Page 4 of 6 - Big Data In Real World

December 18, 2015

Published by Big Data In Real World at December 18, 2015

Categories

Apache Pig Tutorial – Filter Records

Apache Pig Tutorial – Filter Records Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All […]

December 16, 2015

Published by Big Data In Real World at December 16, 2015

Categories

Apache Pig Tutorial – Project and Manipulate Columns

Apache Pig Tutorial – Project and Manipulate Columns Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy […]

December 14, 2015

Published by Big Data In Real World at December 14, 2015

Categories

Hadoop

Apache Pig Tutorial -Load Variations

Apache Pig Tutorial – Load Variations Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts […]

December 7, 2015

Published by Big Data In Real World at December 7, 2015

Categories

Hadoop

Apache Pig Tutorial – Loading Datasets

Apache Pig Tutorial – Loading Datasets Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All […]

October 13, 2015

Published by Big Data In Real World at October 13, 2015

Categories

Hadoop

Is Hive Good At Everything?

Is Hive Good At Everything? Hive is an awesome tool, which takes in SQL like queries and translate them in to MapReduce. Hive is very helpful […]

October 11, 2015

Published by Big Data In Real World at October 11, 2015

Categories

Hadoop

How much memory your Namenode need?

How much memory your Namenode need? This is going to be a very short post. When you are building a cluster from scratch, Hadoop developers and […]

October 5, 2015

Published by Big Data In Real World at October 5, 2015

Categories

Hadoop

Hadoop Archives (HAR)

Hadoop Archives (HAR) Hadoop Archives (HAR) offers an effective way to deal with the small files problem. This post will explain – The problem with small […]

October 5, 2015

Published by Big Data In Real World at October 5, 2015

Categories

Hadoop

Pig vs. Hive

Pig vs. Hive Apache Pig takes in a set of instructions written in Pig Latin, compiles them and produce a set of MapReduce jobs and execute […]

September 8, 2015

Published by Big Data In Real World at September 8, 2015

Categories

Hadoop

Datanode Block Scanner

Datanode Block Scanner In this blog post we saw how HDFS handles and corrects data corruption in HDFS using checksum. During a write operation the datanode […]

September 6, 2015

Published by Big Data In Real World at September 6, 2015

Categories

Hadoop

Dealing With Data Corruption In HDFS

Dealing With Data Corruption In HDFS Hadoop is designed to store and analyze huge volume of data and with huge volume of data stored in HDFS […]

September 1, 2015

Published by Big Data In Real World at September 1, 2015

Categories

Hadoop

Can Reducer always be reused for Combiner?

Can Reducer always be reused for Combiner? A Combiner function is an optional intermediary function which is executed on the Map phase right after the execution […]

August 30, 2015

Published by Big Data In Real World at August 30, 2015

Categories

Hadoop

HDFS Federation

What is HDFS Federation? Namenode is responsible for the successful operation of HDFS. Namenode holds the entire metadata of HDFS, which includes information about files and […]

August 26, 2015

Published by Big Data In Real World at August 26, 2015

Categories

Hadoop

Reading A File From HDFS – Java Program

Reading A File From HDFS – Java Program In this last post we saw how to write a file to HDFS by writing our own Java […]

August 23, 2015

Published by Big Data In Real World at August 23, 2015

Categories

Hadoop

Writing A File To HDFS – Java Program

Writing A File To HDFS – Java Program Writing a file to HDFS is very easy, we can simply execute hadoop fs -copyFromLocal command to copy […]

August 16, 2015

Published by Big Data In Real World at August 16, 2015

Categories

Hadoop

Speculative Execution

What is Speculative Execution? Sometimes you will notice that a Job which has 3 input splits executed 4 mappers and killed the 4th mapper. The job […]

August 11, 2015

Published by Big Data In Real World at August 11, 2015

Categories

Hadoop

Changing Number Of Reducers

Changing Number Of Reducers In this blog post we saw how we can change the number of mappers in a MapReduce execution. In this post, we […]

August 9, 2015

Published by Big Data In Real World at August 9, 2015

Categories

Hadoop

Changing Number Of Mappers

Changing Number Of Mappers Number of mappers always equals to the Number of splits. Having said that it is possible to control the number of splits […]

August 4, 2015

Published by Big Data In Real World at August 4, 2015

Categories

Hadoop

InputSplit vs Block

InputSplit vs Block The central idea behind MapReduce is distributed processing and hence the most important thing is to divide the dataset in to chunks and […]

August 1, 2015

Published by Big Data In Real World at August 1, 2015

Categories

Hadoop

HDFS Block Placement Policy

HDFS Block Placement Policy When a file is uploaded in to HDFS it will be divided in to blocks. HDFS will have to decide where to […]

July 28, 2015

Published by Big Data In Real World at July 28, 2015

Categories

Hadoop

Data Locality in Hadoop

Data Locality in Hadoop Data Locality in Hadoop refers to the “proximity” of the data with respect to the Mapper tasks working on the data. Why […]