Table of Contents

Driven Administration Guide

version 1.1.1

Setting up persistence cluster

Driven persists all historical operational details about your Hadoop applications; depending upon how many applications you run, how often you run, and how much data you process, you will want to size the cluster for your Driven persistence needs.

Note
Driven uses Elasticsearch for its persistence. To make configuration changes specific to Elasticsearch, refer to its documentation. Most of the application-specific parameters related to storing the data can be set through properties in the file driven.properties.

Step 1: Specify the directory used by Driven to save data

driven.storage.data.path=./driven

Step 2: Give your persistence cluster a name

driven.storage.cluster.name=driven

Step 3: Specify shards and replicas for persistence scalability

In Elasticsearch, indices are equivalent to databases in Relational DBMS. Just as Relational Database has a schema, the Elasticsearch index has mapping. An index is broken into shards in order to distribute the data and scale. Replicas are copies of the shards which provide reliability if a node is lost.

To define the settings for your Driven cluster, set these values:

driven.storage.index.cascading.shards=<NUMBER_OF_SHARDS>
driven.storage.index.cascading.replicas=<NUMBER_OF_REPLICAS>

Step 4: Define the hosts that are participating in the Elasticsearch cluster (leave them blank for single node systems):

driven.storage.cluster.discovery.unicast.hosts=

Next