Driven Administration Guide

version 1.1.1

Installing the Driven Plugin

The Driven Plugin is the component that collects data from a running Cascading application and sends it to a Driven Server for processing and analysis. It must be installed and accessible to the Cascading process.

Configuration Options

The general use of Driven does not require any customization beyond that needed to make it available to the running job. There are two main ways of configuring the plugin: and environment variables.

Place the file in the Hadoop classpath. The easiest way to do this is add it in Hadoop’s configuration directory. Set the environment variables for the process that runs the Cascading job.

Driven Plugin Path

The Driven plugin must be placed in the Cascading job classpath.

or, Hadoop classpath

$ export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/path/to/driven-plugin.jar"

Driven Server URL

The self hosted version of Driven Server requires the plugin to be configured with the server’s URL.*_http<s>:/hostname<:port>_*

or, Environment Variable

$ export DRIVEN_SERVER_HOSTS=*_http<s>://hostname<:port>_*


API keys map to Teams (refer to Configuring Teams for Collaboration in the Driven User Guide ) in Driven. An app that uses an API key will be searchable by members of the team that owns the key.

or, Environment Variable


Default App Tags

In addition to app tags added by the developer, default tags can be added automatically via configuration.


or, Environment Variable


Slice Data Suppression

Slice data transmission can be suppressed by setting a property in This property overrides the server side configuration if present.


or, Environment Variable


Slice Data on Completion

Send slice data on completion (true by default).


or, Environment Variable


Java Command Suppression

The java command for a Cascading job can be suppressed by setting a property in This property overrides the server side configuration if present.


or, Environment Variable


Archive Mode

The Driven plugin can be run in archive mode. If archive mode is enabled, all records that are sent to the Driven Server are written to disk even if the server is unreachable. This can be useful if the Driven Server is not available or unreachable as the archive can later be replayed when the server is reachable. Sending data to the server is idempotent, so re-running data again does not corrupt already recorded data.

or, Environment Variable

$ export DRIVEN_ARCHIVE_DIR=/path/to/archive/directory

Amazon EMR

Use the Driven plugin with Amazon Elastic MapReduce (EMR) by using the following client command line argument.

--bootstrap-action //driven-plugin/ --args "--host,${driven_server_hosts},--api-key,${driven_api_key}"
The compressed archive files will be located at the path set with a timestamp when the cascading job started. Do not use the tilde symbol (~) expansion for the path.


