Apache Pig Tutorial – Map

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.

In the previous post, we saw 2 complex types – Tuple and Bag. In this post, we will see another complex type in Pig – Map.

Sample Data

Take a look at couple of records from Department dataset. The first column has the department number, second column has department name. Third column has the address. But the structure of it looks weird doesn’t it? It is a Map.

328;ADMIN HEARNG;[street#939 W El Camino,city#Chicago,state#IL]
43;ANIMAL CONTRL;[street#415 N Mary Ave,city#Chicago,state#IL]

When you see a square bracket, we can infer it is a Map. Map is nothing but a key value pair. Above records have 3 key value pairs – street, city and state.

Load & Project a Map

Now we know how to spot a Map. Let’s see how we can load, define & project a map.

grunt> departments = LOAD '/user/hirw/input/employee-pig/department_dataset_chicago' using PigStorage(';') AS (dept_id:int, dept_name:chararray, address:map[]);

grunt> dept_addr = FOREACH departments GENERATE dept_name, address#'street' as street, address#'city' as city, address#'state' as state;

Loading is easy, for the type simply say map[]. Address is a Map with key value pairs. To project the value for street key from the address column, you can say address#’street’. Similarly for city you can say address#’city’.

Display Results

grunt> top100 = LIMIT dept_addr 100;
grunt> DUMP top100;

See It In Action

Previous Lesson : Tuple & Bag

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Apache Pig Tutorial – Map

Apache Pig Tutorial – Tuple & Bag

Hadoop Mapper and Reducer Output Type Mismatch

Apache Pig Tutorial – Tuple & Bag

Hadoop Mapper and Reducer Output Type Mismatch

Apache Pig Tutorial – Map

Sample Data

Load & Project a Map

Display Results

See It In Action

Big Data In Real World

Related posts

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Hadoop In Real World is changing to Big Data In Real World