Table of Contents

Driven Agent 1.3 Installation Guide - Using Driven Agent with Apache Oozie

version 1.3-20160202

Using Driven Agent with Apache Oozie

Apache Oozie is a popular workflow management solution in the Hadoop ecosystem. The workflow solution supports running a variety of different technologies. Oozie operates by executing computational tasks called actions. The actions are arranged in directed acyclic graphs (DAGs), which are referred to as workflows. The Driven Agent can be used to monitor the execution of HiveActions and MapReduceActions that are managed by Oozie.

Oozie uses a client-server architecture, but the Oozie server is not running any user code by itself. Instead, the server uses a +LauncherMapper+ to drive each action in a given workflow.

The LauncherMapper is a single Map task, which is sent cluster-side and acts as the driving program for the action. Any node of the cluster can potentially be the machine that drives a given Hive query or runs a MapReduce job. Therefore, every machine in the cluster must have access to the Driven Agent for Hive and every machine must be able to communicate with the Driven Server. Your firewall rules should be set accordingly.

Apache Oozie users should always install and configure the Hive agent bundle for Driven.

Driven Agent Bundle Configuration

Instead of installing the Driven Agent JAR files on every machine of the cluster, the Driven Agents can be installed in Oozie’s sharelib on HDFS:

Given a sharelib directory on HDFS of /user/oozie/share/lib/lib_20150721160609, the Driven Agent could be installed as follows:

> hadoop fs -mkdir /user/oozie/share/lib/lib_20150721160609/driven
> hadoop fs -copyFromLocal /path/to/driven-agent-<framework>-bundle-<version>.jar \
  /user/oozie/share/lib/lib_20150721160609/driven$

Some distributions require a restart of the Oozie server after modifying the sharelib. Check the documentation of your distribution.

Now that the Driven Agent is available on HDFS, the agent must be configured on the global workflow or with single-action XML.

The following property sets the Java path for loading the agent. The JAR file name must match the one on HDFS.

<property>
    <name>oozie.launcher.mapred.child.java.opts</name>
    <value>-javaagent:$PWD/driven-agent-<framework>-bundle-<version>.jar</value>
</property>

The following property configures the Oozie Hive action to include JAR files from the hive and driven subdirectories of the currently active sharelib for HiveActions.

<property>
    <name>oozie.action.sharelib.for.hive</name>
    <value>hive,driven</value>
</property>

For MapReduceActions, the map-reduce directory may need to be created. Please verify the map-reduce directory existence in the sharelib. The configuration should look like this:

<property>
    <name>oozie.action.sharelib.for.map-reduce</name>
    <value>map-reduce,driven</value>
</property>

Finally, the following properties configure the Driven Server location and API key for the bundled agent to use. Depending on your deployment and needs, you can freely choose on which level to set these properties. Setting the properties on the workflow level enables the agent for all supported actions in that workflow. Setting the properties on the action level only enables them for that specific action.

<property>
    <name>cascading.management.document.service.hosts</name>
    <value>http://<hostname>:<port>/</value>
</property>
<property>
    <name>cascading.management.document.service.apikey</name>
    <value><apikey></value>
</property>