Apache Pig Tutorial -Load Variations - Big Data In Real World

Apache Pig Tutorial -Load Variations

Apache Pig Tutorial – Loading Datasets
December 7, 2015
Apache Pig Tutorial – Project and Manipulate Columns
December 16, 2015
Apache Pig Tutorial – Loading Datasets
December 7, 2015
Apache Pig Tutorial – Project and Manipulate Columns
December 16, 2015

Apache Pig Tutorial – Load Variations

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.

In this post, we looked at how to load and display dataset using Apache Pig. In this post we will see different LOAD variations in Pig.

Variation 1 – Load Without Column Names or Types

grunt> stocks1 = LOAD '/user/hirw/input/stocks' USING PigStorage(',');

 Variation 2 – Load With Column Names but No Types

grunt> stocks2 = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange, symbol, date, open, high, low, close, volume, adj_close);

 Variation 3 – Load With Column Names and Types

grunt> stocks3 = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float, volume:int, adj_close:float);

The structure of stocks3 (Variation 3) is well defined. But what is the structure of stocks1 and stocks? To look up the structure of a relation (for eg. stocks1) use the DESCRIBE operator.

Describe Operator

Pig can not guess the structure of stocks1 as we did not provide either column names or types.

grunt> DESCRIBE stocks1;
Schema for stocks1 unknown.

With stocks2, Pig know the column names and makes all the column types  to be the default bytearray .

grunt> DESCRIBE stocks2;
stocks2: {exchange: bytearray,symbol: bytearray,date: bytearray,open: bytearray,high: bytearray,low: bytearray,close: bytearray,volume: bytearray,adj_close: bytearray}

Even with an incomplete definition of datasets Pig will be able to work with the dataset. We will see that in the next post.

See It In Action

 

Previous Lesson : Loading Datasets

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

2 Comments

  1. […] Previous Apache Pig Tutorial – Project and Manipulate Columns […]

Apache Pig Tutorial -Load Variations
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X