Apache Pig Tutorial - Executing as a Script - Big Data In Real World

Apache Pig Tutorial – Executing as a Script

Apache Pig Tutorial – Ordering Records
December 20, 2015
Apache Pig Tutorial – Executing Script with Parameters
December 20, 2015
Apache Pig Tutorial – Ordering Records
December 20, 2015
Apache Pig Tutorial – Executing Script with Parameters
December 20, 2015

Apache Pig Tutorial – Executing as a Script

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.

So far in a series of lessons we saw step by step how to calculate average volume for stocks and along the way we learnt several key operators in Apache Pig. In this lesson we will see how to run pig instructions as a script.

DUMP vs. STORE

DUMP operator is used to display or print data on the screen but more often than not we would like to store the results in HDFS. STORE operator is used to store the results in HDFS.

With store we can also specify what delimiter to use when we store the results. In the below example we are instructions Pig to store the records from top10  relation in to output/pig/avg-volume  in HDFS and the column delimiter will be specified using the PigStorage  function. In this case the columns will be delimited by comma.

grunt> top10 = LIMIT avg_vol_ordered 10;
grunt> STORE top10 INTO 'output/pig/avg-volume' USING PigStorage(',');

Running Instructions as a Script

Running a series of pig instructions is very simple. Simply save the instructions in a file. The file extension – .pig is not mandatory but more of a convention. Execute the file like below

pig /hirw-workshop/pig/scripts/average-volume.pig

 See It In Action

Previous Lesson : Ordering Records

Next Lesson : Executing Script with Parameters

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

2 Comments

  1. […] iframe { visibility: hidden; opacity: 0; } Previous Apache Pig Tutorial – Executing Script with […]

Apache Pig Tutorial – Executing as a Script
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X