Driven Agent 1.3 Installation Guide - Driven Agent for Hive

version 1.3-20160202

Driven Agent for Hive

The Driven Agent for Hive is a JVM-level agent library that enables monitoring of Apache Hive queries in Driven. The agent runs in parallel with the main execution of Hive and sends telemetry data to a remote Driven server. Any type of Hive deployment is supported, such as the fat client (hive), the newer Beeline, HiveServer2, or Apache Oozie workflows containing Hive queries. Any application queries that are sent through JDBC or even ODBC can be monitored as well.

Hive Version Requirements

The Driven Agent for Hive can be used with any version of Hive newer than 0.13.0 if you are using the MapReduce execution engine. If you are using Hive with the Tez execution engine, you must at least use Hive 0.14.0 and Tez 0.5.2. In the Tez deployment case you must furthermore ensure that the YARN Application Timeline Server (ATS) is properly configured. Hive works without ATS, but the Driven Agent requires a functioning ATS to monitor all resource usage. Refer to the Tez project documentation for how to properly configure ATS.

Hive on Apache Spark is currently not supported.

Metadata Support

The Driven Agent for Hive enables Driven to recognize application metadata, such as name, version number, or tags, which can be sent with other telemetry data from an application. This is supported by the Driven Agent for Hive. The following table shows the properties supported by the agent:

Table 1. Properties for sending metadata to Driven
Name	Example	Explanation
driven.app.name	driven.app.name=tps-report	Name of the application
driven.app.version	driven.app.version=1.1.5	Version of the application
driven.app.tags	driven.app.tags=cluster:prod,tps,dept:marketing	Comma-separated list of tags

Getting Started explained how these properties can be given on the agent command line. If that is not flexible enough for your use-case, the Driven Agent for Hive offers more options:

The properties can be set within a given HiveQL script via set-commands, an initialization file, or can be given on the command line. It is also possible to add them to the hive-site.xml file. With HiveServer2, you can also pass the properties as JDBC parameters. Basically any way you would normally send parameters to a Hive query is supported.

Using the Hive Agent Artifact (Unbundled Form)

For downloading the latest Driven Agent for Hive, follow the instructions in Getting Started.

Enable the agent by extending the HADOOP_OPTS environment variable before starting a hive fat client, an embedded Beeline client, or HiveServer2. Use the following command format:

export HADOOP_OPTS="-javaagent:/path/to/driven-agent-hive-<version>.jar"

You have to set the HADOOP_OPTS variable. Setting the YARN_OPTS variable, even on a YARN-based cluster, has no effect.

The agent must be installed and configured on the host where Hive queries are executed. In the case of the fat client, it is sufficient to set the environment variable in the shell where hive will be launched. The same applies to the newer Beeline client, when used without HiveServer2.

In case of a HiveServer2 deployment, the agent must be installed on the machine where the server is running. For the agent to work, the HADOOP_OPTS variable must be set in the environment where the server is running. Typically this involves modifying the startup script of HiveServer2. Some distributions ship with graphical cluster administration tools, with which you can customize a hive-env.sh script to administer the HiveServer2.

Each HiveServer2 instance appears as one long-running application in Driven from the time that the first query is executed on the server. When using Driven Agent for Hive, an application is defined as one JVM. As a result, queries that a HiveServer2 runs are displayed as processes of the same application in Driven.

Using the Hive Agent Bundle Artifact

The Hive agent bundle has the same functionality as the plain agent, but the bundle simplifies the installation and configuration of the agent in certain deployment scenarios. Users of Oozie should always use the Hive agent bundle. See Using Driven Agent with Apache Oozie for further information.

To download the latest Driven Agent for Hive, see the Getting Started documentation.

What Cluster Work Is Monitored

The Driven Agent for Hive monitors all queries that use resources (CPU and memory) on the cluster. Queries that do not use cluster resources in terms of computing power, even if they modify the state of the system, are currently not being tracked. Examples of queries that are not tracked are all DML statements or statements like select current_database(), use somedb;, etc.