Driven Administration Guide

version 1.1.1

Extracting Driven data through Scope

In addition to using Driven client for doing backups, you can also use the Driven client to extract data from the Driven application for integrating with third-party monitoring application. For this, the ‘scope’ option is used on the Driven application.

$ driven scope --help

java driven.management.scope.Scope [options...]

Optional:
 env vars: DRIVEN_CLUSTER, DRIVEN_HOSTS

Option                                                       Description
------                                                       -----------
--between <natural language date/time>
--by-parent
--cause, --with-cause [cause or filter with * or ?]          all unique failure causes, or only those match filter
--child-id, --with-child-id [id or partial id]
--cluster                                                    driven cluster name (default: driven)
--counter <group and counter name, eg. 'foo:bar.counter'>
--debug [Boolean]                                            enable debugging (default: false)
--display-width <Integer>                                    width of display (default: 80)
--duration [[pending, started, submitted, running,           interval to calculate duration from (default: [started,
  finished]]                                                   finished])
--duration-interval                                          time period to filter values, eg. 5min:25min
--duration-period                                            time period to bucket values (default: PT15M)
--entity                                                     entity IDs to constrain results
--fields                                                     output field names, '*' denotes defaults (default: [type,
                                                               id, name, status, duration])
--from <Integer: offset from which to begin returning        (default: 0)
  results>
--help
--hosts                                                      driven server host(s) (default: localhost)
--id, --with-id [id or partial id]
--jmx
--json [Options$JsonOpts]                                    output data as json (default: values)
--limit <Integer: limit the number of results>               (default: 1000000)
--name, --with-name [name or filter with * or ?]             all unique names of type, or only those match filter
--no-header
--owner, --with-owner [owner or filter with * or ?]          all unique owners of type, or only those match filter
--parent-id, --with-parent-id [id or partial id]
--parent-name, --with-parent-name <name of parent>
--parent-status, --with-parent-status <Invertible:
  [pending, skipped, started, submitted, running,
  successful, stopped, failed, engaged, finished, all]>
--parent-type, --with-parent-type <ProcessType: [cluster,
  app, cascade, flow, step, slice, undefined]>
--print                                                      print query parameters
--since <natural language date/time, default 2 days from
  'till'>
--sort                                                       sort field names - default is none
--status, --with-status [Invertible: [pending, skipped,
  started, submitted, running, successful, stopped,
  failed, engaged, finished, all]]
--status-time [[pending, started, submitted, running,        date/time field to filter against. one of: [pending,
  finished]]                                                   started, submitted, running, finished] (default:
                                                               started)
--tag, --with-tag [tag name]                                 unique tags of type, or only those that match
--text-search                                                full search of pre-defined text fields - currently: ID,
                                                               name, owner
--till <natural language date/time, default is now>
--type [ProcessType: [cluster, app, cascade, flow, step,     the process type (default: app)
  slice, undefined]]
--verbose                                                    logging level (default: info)
--version

With the scope option, you can query to retrieve information about current and historical processes, where a process can be an application, cascade, flow, step, or slice (a generalization of a Hadoop task).

This option can be used for two roles: Discovery and Monitoring. Discovery is finding specific process instances based on any meta data, while Monitoring is observing the changes in meta data of specific process instances, i.e. has a flow "failed". It also allows you to report on a target process type while refining the results based on parent and target meta data. Additionally, this tool allows you to report on a target process type while refining the results based on parent and target meta data.

For example, to list all “skipped” flows enter the following:

$ driven scope --type flow --status skipped

Or, list all “skipped” flows in a “running” application:

$ driven scope --type flow --status skipped --parent-type app --parent-status running

Or what are the current statuses of all flows in all “running” applications:

$ driven scope --type flow --status --parent-type app --parent-status running

Or more specifically, for each “running” application, what are the statuses of their child flows, grouped by application:

$ driven scope --type flow --status --parent-type app --parent-status running --by-parent

Common Switches

Many switches begin with "with", for example, --with-name. These can be abbreviated further by removing the "with" where you get, --name.

Filters

Use the following command line for filters:

--type = app, cascade, flow, step, slice

--with-tag = user-defined data for filtering

--with-status = one or more of the following values: pending, started, submitted, running, successful, failed, stopped, skipped. If blank, all status values will be displayed as a chart.

The ^ (carrot) before the value means “not”, “^running”.

--with-id = filter for an identifier

--with-name = name or name filter

--with-parent-name = in tandem with --parent-type

--with-parent-status = in tandem with --parent-type

--with-parent-id = for listing children of type having the given parent id, --parent-type is ignored

--statusTime = which status time to filter against; pending, start, submit, run, finished

--till = filter results to date/time

--since = filter results from date/time

--between = filter results between dates/times

Status

Most processes can be in one of nine states. They are:

--pending - when the process is created

--started - when the process has been notified it may start work

--submitted - when the process, or child process, has been submitted to a cluster

--running - when a process is actually executing the data pipeline

--successful - when a process has completed

--failed - when a process has failed

--stopped - when a process, or child process, received a stop notification

--skipped - when a Flow was not executed, usually because the sinks were not stale

--status - shows summary of all status values for instance

Duration

To show a timeline of all durations, grouped by period, use the commands:

--duration = start:finished

--duration-period = the time in which to bucket the results. For example, 10sec, 15min, 2hrs, 1wk

--duration-interval = the range of time to displaay. For example, 15min:30min

Cookbook

How do I monitor job in progress?

If you have already identified a step or a flow that you wish to monitor, enter:

$ driven scope --type slice --parent-type step --parent-id XXXXX --status

This command summarizes all the slice statues for the requested step.

How do I list all users currently running applications?

To list all known users or process owners, enter:

$ driven scope --owner

To filter the list to include owners with running apps:

$ driven scope --owner --status running

Where in the code did the job fail?

If you have the app instance PID, you can list all the causes for the failure by entering:

$ driven scope --parent-id xyz --type slice --cause

This command returns a list of all the exceptions and messages thrown.

For additional detailed information, enter the command:

$ driven scope --type slice --status failed \
  --fields id,failedBranchName,failedPipeLocation,failedCause,failedMethodLocation

Integrating with Nagios

While Driven provides a superior interface to visualize your Cascading application in development and operational stages, often it is necessary to relay higher-level insights to a common monitoring platform.

Nagios is used for monitoring IT infrastructure, including an Hadoop cluster. With Nagios, you can either use the ‘bash shell’ or the JMX interface to develop custom plugins for monitoring Cascading applications through the Driven interface.

For more information about developing Nagios plugins, refer to the Nagios documentation. There are multiple tutorials available in the Nagios community describing how to build bash-based plugins by using the bash interface where you can call the Driven scope commands.

Nagios Bash Plugin Examples

Here are some simple examples of Nagios bash plugins developed using driven scope capability:

This is an example of tracking all the Cascading applications that did not complete successfully.

#!/bin/bash
driven scope --fields type,id,name,owner,status,failedCause | awk '$5!="SUCCESSFUL" { print $1 "\t"  $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6}'
exit 1

To track the longest running Cascading applications:

#!/bin/bash
driven scope --fields type,id,name,owner,status,startTime:submitTime,startTime:finishedTime,duration | awk 'BEGIN { OFS = "\t" } NR == 1 { $9 = "diff." } NR >= 3 { $9 = $7 - $6 } 1' | sort -rgk 9 | head -n 10
exit 1

To track the applications that are consuming the largest storage on the cluster:

#!/bin/bash
driven scope --fields type,id,name,owner,status,tuplesWritten,bytesWritten | awk '{ print $1 "\t"  $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7}' | sort -rgk 7 | head -n 10
exit 1

Integrating with third-party monitoring applications with JMX