Building, Running and Testing Apache Oozie 4.0.0 - Big Data In Real World

Building, Running and Testing Apache Oozie 4.0.0

Adding Swap Memory To Amazon EC2
January 15, 2014
Adding Swap Memory To Amazon EC2
January 15, 2014
oozie

oozie

This post explains step by step procedure to build, run and test Apache Oozie 4.0.0. I will also list the issues and the fixes that I had to go through during the process.

If you are just starting to install Oozie go over the full post first to get an idea of the issues you might get and prepare your environment so you are not wasting a lot of time. If you are stuck with an error in a step, go the appropriate section and if you are lucky you might find the fix for your error.

Environment

Oozie – 4.0.0
Hadoop – 1.2.1 (Vanilla Installation)
Linux – Ubuntu 12.04.3
Cluster – 3 Node cluster on Amazon EC2 (Free Tier)

Make sure you have Make sure java, javac, mvn and zip in command path.

Java – 1.6+
Maven – 3

Download Oozie 4.0.0 from one of the mirrors

wget http://apache.mirrors.hoobly.com/oozie/4.0.0/oozie-4.0.0.tar.gz

Untar Oozie

tar -zxvf oozie-4.0.0.tar.gz

 

Building Oozie

mkdistro.sh in the bin directory orchestrates the build process. I will be skipping the tests during my build. If you prefer to execute the test remove -DskipTests from the below command.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$./mkdistro.sh -DskipTests

Initially I had Maven 2 and when I executed the above command, I got the below error.

[INFO] Project ‘org.apache.oozie:oozie-hadoop’ is duplicated in the reactor

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce (clean) on project oozie-main: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]

I switched to Maven 3. If you don’t have the latest Maven. Run the below command to install Maven.

sudo apt-get install maven

Once installed. Confirm the Maven version.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ mvn –version

Apache Maven 3.0.4

Restart the build with Maven 3.0.4. This time it failed for me with a Java version error.

[INFO] ————————————————————————
[INFO] Building Apache Oozie Main 4.0.0
[INFO] ————————————————————————
[INFO]
[INFO] — maven-enforcer-plugin:1.0:enforce (clean) @ oozie-main —

[WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message:
Detected JDK Version: 1.7.0-25 is not in the allowed range [1.6.0,1.6.1000}].
[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Oozie Main …………………………… FAILURE [0.878s]

 

The above failure indicates that the Java version 1.7 is not allowed for this build. Oozie documentation states that it supports Java 1.6+. I installed Java 1.6 to proceed with the build (I was planning to install 1.6 for something else anyways) but you prefer to build with 1.7, update the java versions in the main pom.xml file which is located in the base Oozie directory.

<properties>
     <javaVersion>1.6</javaVersion> <!– Update the Java version here to 1.7 –>
     <targetJavaVersion>1.6</targetJavaVersion><!– Update the Java version here to 1.7–>

     <distMgmtSnapshotsName>Apache Development Snapshot Repository</distMgmtSnapshotsName>

Restart the build. The the build progressed for a while and failed with the below error.

[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Oozie Main …………………………… SUCCESS [12.371s]
[INFO] Apache Oozie Client …………………………. SUCCESS [43.664s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 …………. SUCCESS [2.590s]
[INFO] Apache Oozie Hadoop Distcp 1.1.1.oozie-4.0.0 …… SUCCESS [0.261s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 Test …….. SUCCESS [2.267s]
[INFO] Apache Oozie Hadoop 2.2.0-SNAPSHOT.oozie-4.0.0 …. FAILURE [1.177s]
[INFO] Apache Oozie Hadoop 2.2.0-SNAPSHOT.oozie-4.0.0 Test  SKIPPED
[INFO] Apache Oozie Hadoop Distcp 2.2.0-SNAPSHOT.oozie-4.0.0  SKIPPED
[INFO] Apache Oozie Hadoop 0.23.5.oozie-4.0.0 ………… SKIPPED
…………..
…………..

[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 1:06.044s
[INFO] Finished at: Sat Jan 04 16:16:03 UTC 2014
[INFO] Final Memory: 34M/81M
[INFO] ————————————————————————

[ERROR] Failed to execute goal on project oozie-hadoop: Could not resolve dependencies for project org.apache.oozie:oozie-hadoop:jar:2.2.0-SNAPSHOT.oozie-4.0.0: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.2.0-SNAPSHOT in apache.snapshots.repo (https://repository.apache.org/content/groups/snapshots) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :oozie-hadoop

 

When oozie 4.0.0 was released , Hadoop maven artifacts version 2.2.0-SNAPSHOT were available but now its GA so we have 2.2.0 (and not SNAPSHOT). The next minor release oozie 4.0.1 has this change but not released yet. To fix this remove the"-SNAPSHOT" in the pom files where it occurs. Find the list of occurrences with the below command.

$ grep -l "2.2.0-SNAPSHOT" `find . -name "pom.xml"`

./pom.xml
./hadooplibs/hadoop-distcp-2/pom.xml
./hadooplibs/hadoop-2/pom.xml
./hadooplibs/hadoop-test-2/pom.xml

Go to the above locations and look for all "2.2.0-SNAPSHOT" references and change them to just "2.2.0". Restart the build again.

Restart the build and this time the build failed when it was building oozie-core with the below error

[INFO]
[INFO] <<< maven-javadoc-plugin:2.7:javadoc (default) @ oozie-core <<<
[INFO]
[INFO] — maven-javadoc-plugin:2.7:javadoc (default) @ oozie-core —
./mkdistro.sh: line 71: 19631 Killed                  mvn clean package assembly:single ${MVN_OPTS} "$@"

ERROR, Oozie distro creation failed

 

“Killed” during a build refers to Maven getting killed due to out of memory. Check out the below link and increase the memory.

http://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#maxmem

If it still fails, make sure you have enough memory to run the process. Run the below command to check the status of memory.

ubuntu@ip-172-x-x-x:~/oozietar/oozie-4.0.0$ watch -n 5 free –m

I was running on Amazon EC2 (free tier) so the process was failing because of no memory. So I had to add 1GB of virtual memory to the instance to proceed with the build. Follow this link for step by step instructions to add additional memory.

The build succeeded after adding memory.

[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Oozie Main …………………………… SUCCESS [4.695s]
[INFO] Apache Oozie Client …………………………. SUCCESS [6:50.417s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 …………. SUCCESS [2:32.798s]
[INFO] Apache Oozie Hadoop Distcp 1.1.1.oozie-4.0.0 …… SUCCESS [7.205s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 Test …….. SUCCESS [12.262s]
[INFO] Apache Oozie Hadoop 2.2.0.oozie-4.0.0 …………. SUCCESS [4:16.581s]
[INFO] Apache Oozie Hadoop 2.2.0.oozie-4.0.0 Test …….. SUCCESS [8.372s]
[INFO] Apache Oozie Hadoop Distcp 2.2.0.oozie-4.0.0 …… SUCCESS [6.188s]
[INFO] Apache Oozie Hadoop 0.23.5.oozie-4.0.0 ………… SUCCESS [3:53.121s]
[INFO] Apache Oozie Hadoop 0.23.5.oozie-4.0.0 Test ……. SUCCESS [13.817s]
[INFO] Apache Oozie Hadoop Distcp 0.23.5.oozie-4.0.0 ….. SUCCESS [10.536s]
[INFO] Apache Oozie Hadoop Libs …………………….. SUCCESS [2:30.953s]
[INFO] Apache Oozie Hbase 0.94.2.oozie-4.0.0 …………. SUCCESS [44.285s]
[INFO] Apache Oozie Hbase Libs ……………………… SUCCESS [31.122s]
[INFO] Apache Oozie HCatalog 0.5.0.oozie-4.0.0 ……….. SUCCESS [1:33.194s]
[INFO] Apache Oozie HCatalog 0.6.0.oozie-4.0.0 ……….. SUCCESS [8.138s]
[INFO] Apache Oozie HCatalog Libs …………………… SUCCESS [10.392s]
[INFO] Apache Oozie Share Lib Oozie …………………. SUCCESS [1:49.388s]
[INFO] Apache Oozie Share Lib HCatalog ………………. SUCCESS [3:48.252s]
[INFO] Apache Oozie Core …………………………… SUCCESS [12:50.519s]
[INFO] Apache Oozie Docs …………………………… SUCCESS [5:08.878s]
[INFO] Apache Oozie Share Lib Pig …………………… SUCCESS [5:54.704s]
[INFO] Apache Oozie Share Lib Hive ………………….. SUCCESS [9:55.869s]
[INFO] Apache Oozie Share Lib Sqoop …………………. SUCCESS [2:00.752s]
[INFO] Apache Oozie Share Lib Streaming ……………… SUCCESS [11.923s]
[INFO] Apache Oozie Share Lib Distcp ………………… SUCCESS [32.880s]
[INFO] Apache Oozie WebApp …………………………. SUCCESS [6:50.050s]
[INFO] Apache Oozie Examples ……………………….. SUCCESS [7:54.814s]
[INFO] Apache Oozie Share Lib ………………………. SUCCESS [6:35.959s]
[INFO] Apache Oozie Tools ………………………….. SUCCESS [12:31.376s]
[INFO] Apache Oozie MiniOozie ………………………. SUCCESS [54.695s]
[INFO] Apache Oozie Distro …………………………. SUCCESS [11:39.100s]
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 1:52:58.988s
[INFO] Finished at: Sun Jan 05 20:48:19 UTC 2014
[INFO] Final Memory: 47M/255M
[INFO] ————————————————————————

Oozie distro created, DATE[2014.01.05-18:55:14GMT] VC-REV[unavailable], available at [/home/ubuntu/oozie-4.0.0/distro/target]

 

Check for the distribution created in the distro/target folder.

 

Building Oozie War and preparing Oozie

Untar the distribution which was created by the build and we will be working in the expanded directory for the below steps.

Create a libext/ directory in the directory where Oozie was expanded.

Oozie by default create a build with Hadoop 1.1.1 but my Hadoop environment is 1.2.1.

Copy your Hadoop jars (hadoop-client-1.2.1.jar and hadoop-core-1.2.1.jar)  in to the libext directory.

Also copy the ext-2.2.zip to the libext directory also copy the other jars needed. You can down ext-2.2 from the below location.

http://extjs.com/deploy/ext-2.2.zip

/home/ubuntu/oozie-4.0.0/ is where I have the expanded distribution and my libext looks like below. Make sure yours look similar.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/libext$ ls -ltr
total 16280
-rw-rw-r– 1 ubuntu ubuntu     414 Jan  6 01:31 hadoop-client-1.2.1.jar
-rw-rw-r– 1 ubuntu ubuntu 4203147 Jan  6 01:32 hadoop-core-1.2.1.jar
-rw-rw-r– 1 ubuntu ubuntu   15010 Jan  6 01:33 xmlenc-0.52.jar
-rw-rw-r– 1 ubuntu ubuntu  188671 Jan  6 01:35 commons-beanutils-1.7.0.jar
-rw-rw-r– 1 ubuntu ubuntu   41123 Jan  6 01:35 commons-cli-1.2.jar
-rw-rw-r– 1 ubuntu ubuntu  206035 Jan  6 01:35 commons-beanutils-core-1.8.0.jar
-rw-rw-r– 1 ubuntu ubuntu   58160 Jan  6 01:35 commons-codec-1.4.jar
-rw-rw-r– 1 ubuntu ubuntu  575389 Jan  6 01:35 commons-collections-3.2.1.jar
-rw-rw-r– 1 ubuntu ubuntu   13619 Jan  6 01:35 commons-daemon-1.0.1.jar
-rw-rw-r– 1 ubuntu ubuntu  298829 Jan  6 01:35 commons-configuration-1.6.jar
-rw-rw-r– 1 ubuntu ubuntu  112341 Jan  6 01:35 commons-el-1.0.jar
-rw-rw-r– 1 ubuntu ubuntu  143602 Jan  6 01:35 commons-digester-1.8.jar
-rw-rw-r– 1 ubuntu ubuntu  279781 Jan  6 01:35 commons-httpclient-3.0.1.jar
-rw-rw-r– 1 ubuntu ubuntu  163151 Jan  6 01:35 commons-io-2.1.jar
-rw-rw-r– 1 ubuntu ubuntu   26202 Jan  6 01:35 commons-logging-api-1.0.4.jar
-rw-rw-r– 1 ubuntu ubuntu   60686 Jan  6 01:35 commons-logging-1.1.1.jar
-rw-rw-r– 1 ubuntu ubuntu  261809 Jan  6 01:35 commons-lang-2.4.jar
-rw-rw-r– 1 ubuntu ubuntu  832410 Jan  6 01:35 commons-math-2.1.jar
-rw-rw-r– 1 ubuntu ubuntu  273370 Jan  6 01:35 commons-net-3.1.jar
-rw-rw-r– 1 ubuntu ubuntu  391834 Jan  6 01:35 log4j-1.2.15.jar
-rw-rw-r– 1 ubuntu ubuntu   65261 Jan  6 01:36 oro-2.0.8.jar
-rw-rw-r– 1 ubuntu ubuntu  706710 Jan  6 01:36 hsqldb-1.8.0.10.jar
-rw-rw-r– 1 ubuntu ubuntu  227500 Jan  6 01:37 jackson-core-asl-1.8.8.jar
-rw-rw-r– 1 ubuntu ubuntu  668564 Jan  6 01:37 jackson-mapper-asl-1.8.8.jar
-rw-rw-r– 1 ubuntu ubuntu 6800612 Jan  6 14:47 ext-2.2.zip

 

Add below configuration in Hadoop’s core-site.xml

<!– OOZIE –>
<property>
   <name>hadoop.proxyuser.[OOZIE_SERVER_USER].hosts</name>
   <value>[OOZIE_SERVER_HOSTNAME]</value>
</property>
<property>
   <name>hadoop.proxyuser.[OOZIE_SERVER_USER].groups</name>
   <value>[USER_GROUPS_THAT_ALLOW_IMPERSONATION]</value>
</property>

My core-site.xml looks like below

ubuntu@ip-172-x-x-x:~/hadoop-1.2.1/conf$ vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!– Put site-specific property overrides in this file. –>

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://ec2-54-x-x-x.compute-1.amazonaws.com:9000</value>
     </property>
  <property>
    <name>hadoop.proxyuser.ubuntu.hosts</name>
    <value>ec2-54-x-x-x.compute-1.amazonaws.com</value>
  </property>
  <property>
    <name>hadoop.proxyuser.ubuntu.groups</name>
    <value>ubuntu</value>
  </property>
</configuration>

Now we are ready to build the Oozie war file. Run the below command to build the war

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie-setup.sh prepare-war

If you get the below error, install zip with the command sudo apt-get install zip

INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/xmlenc-0.52.jar
./oozie-setup.sh: line 302: zip: command not found
Failed: creating new Oozie WAR

If everything goes well in creating the war file. You should see the below.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie-setup.sh prepare-war
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-beanutils-1.7.0.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-beanutils-core-1.8.0.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-cli-1.2.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-codec-1.4.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-collections-3.2.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-daemon-1.0.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-digester-1.8.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-el-1.0.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-httpclient-3.0.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-io-2.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-lang-2.4.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-logging-1.1.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-logging-api-1.0.4.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-math-2.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/commons-net-3.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/hadoop-client-1.2.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/hadoop-core-1.2.1.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/hsqldb-1.8.0.10.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/jackson-core-asl-1.8.8.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/jackson-mapper-asl-1.8.8.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/log4j-1.2.15.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/oro-2.0.8.jar
INFO: Adding extension: /home/ubuntu/oozie-4.0.0/libext/xmlenc-0.52.jar

New Oozie WAR file with added ‘ExtJS library, JARs’ at /home/ubuntu/oozie-4.0.0/oozie-server/webapps/oozie.war

INFO: Oozie is ready to be started

Upload sharelib directory to HDFS

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$./oozie-setup.sh sharelib create -fs hdfs://ec2-54-x-x-x.compute-1.amazonaws.com:9000

 

Create database for Oozie

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie-setup.sh db create -run

setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version ‘4.0.0’

 

Start Oozie

We are now ready to start oozie. Start using the command oozied.sh start

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozied.sh start

Setting OOZIE_HOME:          /home/ubuntu/oozie-4.0.0
Setting OOZIE_CONFIG:        /home/ubuntu/oozie-4.0.0/conf
Sourcing:                    /home/ubuntu/oozie-4.0.0/conf/oozie-env.sh
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Setting OOZIE_CONFIG_FILE:   oozie-site.xml
Setting OOZIE_DATA:          /home/ubuntu/oozie-4.0.0/data
Setting OOZIE_LOG:           /home/ubuntu/oozie-4.0.0/logs
Setting OOZIE_LOG4J_FILE:    oozie-log4j.properties
Setting OOZIE_LOG4J_RELOAD:  10
Setting OOZIE_HTTP_HOSTNAME: ip-172-x-x-x.ec2.internal
Setting OOZIE_HTTP_PORT:     11000
Setting OOZIE_ADMIN_PORT:     11001
Setting OOZIE_HTTPS_PORT:     11443
Setting OOZIE_BASE_URL:      http://ip-172-x-x-x.ec2.internal:11000/oozie
Setting CATALINA_BASE:       /home/ubuntu/oozie-4.0.0/oozie-server
Setting OOZIE_HTTPS_KEYSTORE_FILE:     /home/ubuntu/.keystore
Setting OOZIE_HTTPS_KEYSTORE_PASS:     password
Setting CATALINA_OUT:        /home/ubuntu/oozie-4.0.0/logs/catalina.out
Setting CATALINA_PID:        /home/ubuntu/oozie-4.0.0/oozie-server/temp/oozie.pid

Using   CATALINA_OPTS:        -Xmx1024m -Dderby.stream.error.file=/home/ubuntu/oozie-4.0.0/logs/derby.log
Adding to CATALINA_OPTS:     -Doozie.home.dir=/home/ubuntu/oozie-4.0.0 -Doozie.config.dir=/home/ubuntu/oozie-4.0.0/conf -Doozie.log.dir=/home/ubuntu/oozie-4.0.0/logs -Doozie.data.dir=/home/ubuntu/oozie-4.0.0/data -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=ip-172-x-x-x.ec2.internal -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://ip-172-x-x-x.ec2.internal:11000/oozie -Doozie.https.keystore.file=/home/ubuntu/.keystore -Doozie.https.keystore.pass=password -Djava.library.path=

Using CATALINA_BASE:   /home/ubuntu/oozie-4.0.0/oozie-server
Using CATALINA_HOME:   /home/ubuntu/oozie-4.0.0/oozie-server
Using CATALINA_TMPDIR: /home/ubuntu/oozie-4.0.0/oozie-server/temp
Using JRE_HOME:        /usr/lib/jvm/java-6-openjdk-amd64/
Using CLASSPATH:       /home/ubuntu/oozie-4.0.0/oozie-server/bin/bootstrap.jar
Using CATALINA_PID:    /home/ubuntu/oozie-4.0.0/oozie-server/temp/oozie.pid

Once started check the status of Oozie

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie admin -oozie http://localhost:11000/oozie -status

System mode: NORMAL

Just to be sure check for the process oozie.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ps aux |grep oozie

ubuntu   20747 20.7 26.0 1939200 157632 pts/3  Sl   01:01   2:22 /usr/lib/jvm/java-6-openjdk-amd64//bin/java -Djava.util.logging.config.file=/home/ubuntu/oozie-4.0.0/oozie-server/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx1024m -Dderby.stream.error.file=/home/ubuntu/oozie-4.0.0/logs/derby.log -Doozie.home.dir=/home/ubuntu/oozie-4.0.0 -Doozie.config.dir=/home/ubuntu/oozie-4.0.0/conf -Doozie.log.dir=/home/ubuntu/oozie-4.0.0/logs -Doozie.data.dir=/home/ubuntu/oozie-4.0.0/data -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=ip-172-x-x-x.ec2.internal -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://ip-172-x-x-x.ec2.internal:11000/oozie -Doozie.https.keystore.file=/home/ubuntu/.keystore -Doozie.https.keystore.pass=password -Djava.library.path= -Djava.endorsed.dirs=/home/ubuntu/oozie-4.0.0/oozie-server/endorsed -classpath /home/ubuntu/oozie-4.0.0/oozie-server/bin/bootstrap.jar -Dcatalina.base=/home/ubuntu/oozie-4.0.0/oozie-server -Dcatalina.home=/home/ubuntu/oozie-4.0.0/oozie-server -Djava.io.tmpdir=/home/ubuntu/oozie-4.0.0/oozie-server/temp org.apache.catalina.startup.Bootstrap start

ubuntu   20912  0.0  0.1   8104   924 pts/3    S+   01:13   0:00 grep –color=auto oozie

 

Testing Oozie

 

We are now ready to test oozie. The location of the examples directory is below. We are going to test a MapReduce example that comes with Oozie. Edit the job.properties file to include your environment specific properties. The job.properties file can be found under.

/home/ubuntu/oozie-4.0.0/examples/target/oozie-examples-4.0.0-examples/examples/apps/map-reduce

My job.properties look like below

nameNode=hdfs://ec2-54-x-x-x.compute-1.amazonaws.com:9000
jobTracker=ec2-54-x-x-x.compute-1.amazonaws.com:54311
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

Upload the example directory to HDFS.

$HADOOP_HOME/bin/hadoop fs -put examples examples

Start the Oozie job as below. You will see that we are referencing the job.properties file we just edited. The command will return the job id. We will be using the job id to query the status of the job.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie job -oozie http://localhost:11000/oozie -config /home/ubuntu/oozie-4.0.0/examples/target/oozie-examples-4.0.0-examples/examples/apps/map-reduce/job.properties –run

job: 0000000-140106144925417-oozie-ubun-W

Check the status of the job with the below command. Use the job id you got from the above command.

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ ./oozie job -oozie http://localhost:11000/oozie -info 0000000-140106144925417-oozie-ubun-W
Job ID : 0000000-140106144925417-oozie-ubun-W
————————————————————————————————————————————
Workflow Name : map-reduce-wf
App Path      : hdfs://ec2-54-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/examples/apps/map-reduce
Status        : SUCCEEDED
Run           : 0
User          : ubuntu
Group         : –
Created       : 2014-01-07 00:22 GMT
Started       : 2014-01-07 00:22 GMT
Last Modified : 2014-01-07 00:23 GMT
Ended         : 2014-01-07 00:23 GMT
CoordAction ID: –

Actions
————————————————————————————————————————————
ID                                                                            Status    Ext ID                 Ext Status Err Code
————————————————————————————————————————————
0000000-140106144925417-oozie-ubun-W@:start:                                  OK        -                      OK         –
————————————————————————————————————————————
0000000-140106144925417-oozie-ubun-W@mr-node                                  OK        job_201401052231_0002  SUCCEEDED  –
————————————————————————————————————————————
0000000-140106144925417-oozie-ubun-W@end                                      OK        -                      OK         –
————————————————————————————————————————————

 

You can also check the output file in HDFS

ubuntu@ip-172-x-x-x:~/oozie-4.0.0/bin$ $HADOOP_HOME/bin/hadoop fs -cat /user/ubuntu/examples/output-data/map-reduce/part-00000
Warning: $HADOOP_HOME is deprecated.

0       To be or not to be, that is the question;
42      Whether ’tis nobler in the mind to suffer
84      The slings and arrows of outrageous fortune,
129     Or to take arms against a sea of troubles,
172     And by opposing, end them. To die, to sleep;
………….

Conclusion

That’s it. We are done !!! We have successfully installed and tested Oozie 4.0.0.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Leave a Reply

Building, Running and Testing Apache Oozie 4.0.0
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X