Table of Contents

Driven Agent 1.3 Installation Guide - Getting Started

version 1.3-20160202

Getting Started

The Driven Agent is a collection of JVM libraries that enables monitoring of Hadoop applications like Apache Hive or native MapReduce jobs with Driven.

The Driven Agent is available for the following frameworks:

The agent is compatible with the following scheduler:

The Driven Agent is a Java agent wrapper for the Driven Plugin. There is one JAR file for installing the agent for Hive, and another JAR file for installing the agent for MapReduce.

Both the JAR file to install the agent for Hive applications and the JAR file for native MapReduce are bundled with the Driven Plugin.

To monitor only Cascading applications with Driven, the Driven Agent is not necessary. See Driven documentation for details.

Downloading the Driven Agent

Select the agent library that is applicable to your framework.

# latest Hive agent bundle
> wget -i http://files.concurrentinc.com/driven-agent/1.3/latest-driven-agent-hive-bundle.txt

# latest MapReduce agent bundle
> wget -i http://files.concurrentinc.com/driven-agent/1.3/latest-driven-agent-mr-bundle.txt

Installing the Driven Agent

Note for Apache Oozie users: Use the Driven Agent with Apache Oozie installation documentation instead of the following procedure.

The following steps assume that Hadoop applications are being launched from:

  • the command line, via bin/yarn jar …​ or bin/hadoop jar …​

  • an interactive shell like Beeline when using Hive with a "thick" client

  • jobs that start from a long-running server like Hive Server or from an application server like Tomcat, JBoss, Spring, etc.

Driven defines an application context as the JVM instance driving and orchestrating the client side of Hadoop applications. Each Hive query or MapReduce job appears as a single Unit of Work in that application. In a single application context, there can be thousands of queries. Each instance of the application entails a shutdown and restart.

Variables are used in many of the commands in the following sections:

  • [framework] stands for Hive (hive) or MapReduce (mr)

  • <version> stands for the current agent version

Agent Quick Start:

Step 1: Create a new directory named driven-agent in your home directory.

Step 2: Copy the downloaded installation JAR file into the driven-agent directory.

Step 3: Create a driven-agent.properties file with the appropriate settings for your environment. See Configuring the Driven Agent section to properly configure both the drivenHosts and drivenAPIkey settings (if API key is required).

Creating a different driven-agent.properties file for each unique application enables the display of application-specific values (like name and tags) in Driven and lets you assign applications to specific teams via the Driven team API key.

Step 4: In the current console or within a bash script, use either export HADOOP_OPTS or export YARN_CLIENT_OPTS (depending on your environment) to pass the options in the following command:

export YARN_CLIENT_OPTS="-javaagent:/path/to/driven-agent-[framework]-<version>.jar=optionsFile=driven-agent.properties"

Step 5: Run your application.

After installing the agent and running your application, log in to the Driven Server to see your application’s performance information.

The URL to the current application will be printed in the logs.
Putting the agent on the runtime CLASSPATH will have no effect. Be sure to place the -javaagent:/path/to/driven-agent-[framework]-<version>.jar switch on the JVM command line before the application jar.

Configuring the Driven Agent

The Driven Agent accepts various configuration options after the path to the Driven Agent JAR file.

java -javaagent:/path/to/driven-agent-[framework]-<version>.jar[=key1=value1;key2=value2,value3] <other java arguments>

Available agent options can be printed to the console by running the Driven Agent JAR with the following command:

java -jar /path/to/driven-agent-[framework]-<version>.jar

The agent also accepts a properties file via the optionsFile option. To generate a template file with defaults, run the following command (with a dash as the only argument):

java -jar /path/to/driven-agent-[framework]-<version>.jar - > driven-agent.properties

This creates a driven-agent.properties template in the current directory.

The file specified by optionsFile will be treated relative to the JVM current working directory. If not found in the JVM working directory, the file will be relative to the Driven Agent directory, unless the path is absolute.
Some of the following configuration options might not be available for all frameworks.

Agent-Specific Options

optionsFile

Specifies the file that provides option values for the Driven Agent. All values take precedence over the agent argument values. The file is relative to the current directory. If no current directory is found, the file is relative to the agent’s JAR directory.

agentDisableLogging

Disables all Driven Agent and Driven Plugin logging.

agentDebugLogging

Enables debug logging in the Driven Agent and the Driven Plugin.

agentExitOnlyOnJobCompletion

Forces the JVM to remain active until the monitored jobs complete, fail, or are killed. The appCompletionTimeout option is not supported. Default is TRUE.

agentKillJobsOnExit

Kills all running jobs when JVM is exited. Work is marked as STOPPED if System.exit is called when detaching the client.

agentLogSystemExitCalls

Enables logging of the stack trace making System and Runtime exit calls. The option also installs a custom SecurityManager if no other SecurityManager has been installed.

agentLogSubmitStackTrace

Enables logging of the stack trace making submit() calls to the cluster, which helps in diagnosing the root main class and function.

Plugin-Specific Options

drivenHosts

Specifies the server host names and ports where data is to be sent. Values should be entered in this format: host1:80,host2:8080. The http:// or https:// prefix may be placed before the host name.

If you are using the Early Access Program (EAP) or the Hosted Trial, drivenHosts must be set to https://driven.cascading.io/ or https://trial.driven.io/, respectively.
drivenAPIKey

Specifies the API key that is associated with application executions.

If you are using the EAP or the Hosted Trial, drivenAPIKey must be set in order to see your applications in Driven after logging in. This requires an account, which you can get on the Driven Trial Options website.
drivenArchiveDir

Indicates the local directory where copies of transmitted data are to be stored.

drivenDisabled

Disables the sending of data to the Driven Server.

drivenSuppressSliceData

Disables sending slice-level data and detailed low-level performance visualizations; overrides server settings. This option can reduce network traffic, load on any history servers, and indexing latency.

drivenContinuousSliceData

Enables frequent updates of slice-level data before slice completion (update on completion is the default); overrides server settings. This option can increase network traffic, load on any history server, and indexing latency.

Some platforms do not support retrieving intermediate results at this level.
drivenSuppressJavaCommandData

Disables sending command-line argument data; overrides server settings. This option prevents sending sensitive information that might appear on the command line.

Application-Specific Options

appName

Names an application. The default name is the JAR file name without version information.

appVersion

Specifies the version of an application. The default version is parsed from the JAR file name.

appTags

Assigns tags that should be associated with the application, for example: cluster:prod,dept:engineering

appCompletionTimeout

Specifies timeout (in milliseconds) to wait to send all completed application details before shutdown.

appFailedOnAnyUoWFail

Indicates that if any Unit of Work fails, then the application is marked as FAILED. The default is to mark an app as FAILED only if the last Unit of Work fails.

appFailedOnAnyUoWPending

Indicates that if any Unit of Work is not started, then the application is marked as FAILED. The default is to mark an app as FAILED only if the last Unit of Work does not start.