Table of Contents

Driven User Guide

version 1.1.1

Visualizing Hive Applications

Driven for Hive delivers the end-to-end visibility required for managing and monitoring your Hive applications. With your Hive queries running within the resilient Cascading execution framework, your Hive applications benefit from all the virtues of running the application on the Cascading platform – dynamic management of all Hive objects, visibility into the end-to-end flow of the application, instrumentation, orchestration of your Hive modules for error-recovery, and seamless integration with major third-party systems such as ElasticSearch, Teradata.

Hive_App2 Figure 1: Driven displays your Hive application as DAG representation, which allows you to drill down to a specific flow such as CalculateAverageListPrice

Using HiveFlow

With Driven for Hive, you can simply move your Hive Query Language (HQL) queries into production using an API from HiveFlow and the runtime monitoring capabilities of Driven.

HiveFlow is a simple Java wrapper that simplifies the chaining of multiple HQL statements into a single maintainable application. It transparently sends telemetry to Driven so that an HQL-based application can be managed and monitored in real-time.

Driven provides the current status of running applications and a searchable history of past application executions.

With HiveFlow, even applications based on multiple technologies, such as Hive, custom MapReduce, Cascading, and Scalding, can be chained together within the same application (Apache Hadoop job jar), thereby simplifying testing, deployment, maintenance, and monitoring.

Driven for Hive

Using Driven, you can perform critical tasks necessary for operationalizing and maintaining your Hive applications:

  • Visualize all queries being executed, as well as all dependencies to comprehend your end-to-end Hive application

  • Execute full text search over HQL to find all apps that run a given query

  • Organize and compare your applications leveraging tags and other search parameters

  • Get real-time and historical operational insights to identify areas of bottleneck to tune your application

  • Track current and historical operational metrics for audits, governance and lineage

  • Automatically orchestrate dependent Hive scripts with fault-recovery to make your application robust

  • Correlate your application behavior with other simultaneous events in the cluster

Next