Apache Pig Tutorial - Map - Big Data In Real World

Apache Pig Tutorial – Map

Apache Pig Tutorial – Tuple & Bag
December 31, 2015
Hadoop Mapper and Reducer Output Type Mismatch
June 22, 2016
Apache Pig Tutorial – Tuple & Bag
December 31, 2015
Hadoop Mapper and Reducer Output Type Mismatch
June 22, 2016

Apache Pig Tutorial – Map

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.

In the previous post, we saw 2 complex types – Tuple and Bag. In this post, we will see another complex type in Pig – Map.

Sample Data

Take a look at couple of records from Department dataset. The first column has the department number, second column has department name. Third column has the address. But the structure of it looks weird doesn’t it? It is a Map.

328;ADMIN HEARNG;[street#939 W El Camino,city#Chicago,state#IL]
43;ANIMAL CONTRL;[street#415 N Mary Ave,city#Chicago,state#IL]

When you see a square bracket, we can infer it is a Map. Map is nothing but a key value pair. Above records have 3 key value pairs – street, city and state.

Load & Project a Map

Now we know how to spot a Map. Let’s see how we can load, define & project a map.

grunt> departments = LOAD '/user/hirw/input/employee-pig/department_dataset_chicago' using PigStorage(';') AS (dept_id:int, dept_name:chararray, address:map[]);

grunt> dept_addr = FOREACH departments GENERATE dept_name, address#'street' as street, address#'city' as city, address#'state' as state;

Loading is easy, for the type simply say map[]. Address is a Map with key value pairs. To project the value for street key from the address column, you can say address#’street’. Similarly for city you can say address#’city’.

Display Results

grunt> top100 = LIMIT dept_addr 100;
grunt> DUMP top100;

 See It In Action

Previous Lesson : Tuple & Bag

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Apache Pig Tutorial – Map
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X