Driven Agent 1.3 Installation Guide - Getting Started
version 1.3-20160202Getting Started
The Driven Agent is a collection of JVM libraries that enables monitoring of Hadoop applications like Apache Hive or native MapReduce jobs with Driven.
The Driven Agent is available for the following frameworks:
The agent is compatible with the following scheduler:
The Driven Agent is a Java agent wrapper for the Driven Plugin. There is one JAR file for installing the agent for Hive, and another JAR file for installing the agent for MapReduce.
Both the JAR file to install the agent for Hive applications and the JAR file for native MapReduce are bundled with the Driven Plugin.
To monitor only Cascading applications with Driven, the Driven Agent is not necessary. See Driven documentation for details. |
Downloading the Driven Agent
Select the agent library that is applicable to your framework.
# latest Hive agent bundle
> wget -i http://files.concurrentinc.com/driven-agent/1.3/latest-driven-agent-hive-bundle.txt
# latest MapReduce agent bundle
> wget -i http://files.concurrentinc.com/driven-agent/1.3/latest-driven-agent-mr-bundle.txt
Installing the Driven Agent
Note for Apache Oozie users: Use the Driven Agent with Apache Oozie installation documentation instead of the following procedure.
The following steps assume that Hadoop applications are being launched from:
-
the command line, via
bin/yarn jar …
orbin/hadoop jar …
-
an interactive shell like Beeline when using Hive with a "thick" client
-
jobs that start from a long-running server like Hive Server or from an application server like Tomcat, JBoss, Spring, etc.
Driven defines an application context as the JVM instance driving and orchestrating the client side of Hadoop applications. Each Hive query or MapReduce job appears as a single Unit of Work in that application. In a single application context, there can be thousands of queries. Each instance of the application entails a shutdown and restart. |
Variables are used in many of the commands in the following sections:
-
[framework]
stands for Hive (hive) or MapReduce (mr) -
<version>
stands for the current agent version
Agent Quick Start:
Step 1: Create a new directory named driven-agent
in your home directory.
Step 2: Copy the downloaded installation JAR file into the driven-agent
directory.
Step 3: Create a driven-agent.properties
file with the appropriate settings
for your environment. See Configuring the Driven Agent section
to properly configure both the drivenHosts and
drivenAPIkey settings (if API key is required).
Creating a different driven-agent.properties file for each unique
application enables the display of application-specific values (like name and
tags) in Driven and lets you assign applications to specific teams via the
Driven team API key.
|
Step 4: In the current console or within a bash script, use either export
HADOOP_OPTS
or export YARN_CLIENT_OPTS
(depending on your environment) to
pass the options in the following command:
export YARN_CLIENT_OPTS="-javaagent:/path/to/driven-agent-[framework]-<version>.jar=optionsFile=driven-agent.properties"
Step 5: Run your application.
After installing the agent and running your application, log in to the Driven Server to see your application’s performance information.
The URL to the current application will be printed in the logs. |
Putting the agent on the runtime CLASSPATH will have no effect. Be sure to
place the -javaagent:/path/to/driven-agent-[framework]-<version>.jar switch on
the JVM command line before the application jar.
|
Configuring the Driven Agent
The Driven Agent accepts various configuration options after the path to the Driven Agent JAR file.
java -javaagent:/path/to/driven-agent-[framework]-<version>.jar[=key1=value1;key2=value2,value3] <other java arguments>
Available agent options can be printed to the console by running the Driven Agent JAR with the following command:
java -jar /path/to/driven-agent-[framework]-<version>.jar
The agent also accepts a properties file via the optionsFile
option. To
generate a template file with defaults, run the following command (with a dash
as the only argument):
java -jar /path/to/driven-agent-[framework]-<version>.jar - > driven-agent.properties
This creates a driven-agent.properties
template in the current directory.
The file specified by optionsFile will be treated relative to the
JVM current working directory. If not found in the JVM working directory,
the file will be relative to the Driven Agent directory, unless the path is
absolute.
|
Some of the following configuration options might not be available for all frameworks. |
Agent-Specific Options
optionsFile
-
Specifies the file that provides option values for the Driven Agent. All values take precedence over the agent argument values. The file is relative to the current directory. If no current directory is found, the file is relative to the agent’s JAR directory.
agentDisableLogging
-
Disables all Driven Agent and Driven Plugin logging.
agentDebugLogging
-
Enables debug logging in the Driven Agent and the Driven Plugin.
agentExitOnlyOnJobCompletion
-
Forces the JVM to remain active until the monitored jobs complete, fail, or are killed. The
appCompletionTimeout
option is not supported. Default isTRUE
. agentKillJobsOnExit
-
Kills all running jobs when JVM is exited. Work is marked as STOPPED if System.exit is called when detaching the client.
agentLogSystemExitCalls
-
Enables logging of the stack trace making System and Runtime exit calls. The option also installs a custom SecurityManager if no other SecurityManager has been installed.
agentLogSubmitStackTrace
-
Enables logging of the stack trace making
submit()
calls to the cluster, which helps in diagnosing the root main class and function.
Plugin-Specific Options
drivenHosts
-
Specifies the server host names and ports where data is to be sent. Values should be entered in this format:
host1:80,host2:8080
. Thehttp://
orhttps://
prefix may be placed before the host name.
If you are using the Early Access Program (EAP) or the Hosted Trial,
drivenHosts must be set to https://driven.cascading.io/ or
https://trial.driven.io/ , respectively.
|
drivenAPIKey
-
Specifies the API key that is associated with application executions.
If you are using the EAP or the Hosted Trial, drivenAPIKey must be set
in order to see your applications in Driven after logging in. This requires an
account, which you can get on the Driven
Trial Options website.
|
drivenArchiveDir
-
Indicates the local directory where copies of transmitted data are to be stored.
drivenDisabled
-
Disables the sending of data to the Driven Server.
drivenSuppressSliceData
-
Disables sending slice-level data and detailed low-level performance visualizations; overrides server settings. This option can reduce network traffic, load on any history servers, and indexing latency.
drivenContinuousSliceData
-
Enables frequent updates of slice-level data before slice completion (update on completion is the default); overrides server settings. This option can increase network traffic, load on any history server, and indexing latency.
Some platforms do not support retrieving intermediate results at this level. |
drivenSuppressJavaCommandData
-
Disables sending command-line argument data; overrides server settings. This option prevents sending sensitive information that might appear on the command line.
Application-Specific Options
appName
-
Names an application. The default name is the JAR file name without version information.
appVersion
-
Specifies the version of an application. The default version is parsed from the JAR file name.
appTags
-
Assigns tags that should be associated with the application, for example:
cluster:prod,dept:engineering
appCompletionTimeout
-
Specifies timeout (in milliseconds) to wait to send all completed application details before shutdown.
appFailedOnAnyUoWFail
-
Indicates that if any Unit of Work fails, then the application is marked as FAILED. The default is to mark an app as FAILED only if the last Unit of Work fails.
appFailedOnAnyUoWPending
-
Indicates that if any Unit of Work is not started, then the application is marked as FAILED. The default is to mark an app as FAILED only if the last Unit of Work does not start.