Driven Plugin FAQ

version 1.1.1

Driven Server

Can I run my own Driven server locally on-premise or remotely at my cloud or hosting provider?

Yes, you can download Driven here: http://cascading.io/trial/

Driven Plugin

I am seeing OOM errors. What are the possible causes?

OOM messages by Driven plugin are a symptom of an error rather than an error by itself. When the Driven plugin is unable to send data to the Driven server, or if the application is publishing data to Driven plugin at a rate faster than what the Driven plugin can send the data to the server. The Driven plugin maintains an internal backlog queue which it regularly purges to manage its memory consumptions.

There can be many possible reasons:

The Driven server is not up
The Cascading application is unable to connect to the Driven server
Cluster is inadequately provisioning for name node processes and machine (driven plugins pings the namenode for slice information). Try setting the xms and xmx setting of your Cascading application and namenode processes to the following: -Xms4096m -Xmx4096m

If your Cascading application executes successfully without the Driven plugin, it will execute successfully whether or not the Driven plugin completes publishing successfully. The only caveat is that if your application uses Yarn, and if Yarn crashes, then the Cascading app will (obviously) be affected. The publishing of the data to the Driven server is decoupled from the execution of the Cascading MapReduce application.

What versions of Hadoop are supported?

Driven works with any version of Hadoop that’s compatible with Cascading. See http://www.cascading.org/support/compatibility/ for the full list.

What are the benefits of registering for Driven?

When you signup at http://cascading.io/register/ for the cloud version of Driven, or use the self-hosted version for Driven and configure the Driven API key, you get access to additional features such as having the ability to organize applications into Team. This becomes useful when you want to categorize and organize applications around lines of business, different classes of customers, or compare historical execution behavior across the same class of applications to identify outliers and trends.

What versions of the JDK are supported?

The matrix of supported JDKs is at http://www.cascading.org/support/compatibility/

How do I see what data is being transmitted?

The plug-in can be run in 'archive mode.'

If archive mode is set, all records that are sent to the Driven Server are written to disk even if the server is unreachable. This can be useful if the Driven server is not available or unreachable as the archive can later be 'replayed' when the server is reachable. Sending data to the server is idempotent, so re-running data again does not corrupt already recorded data. To use archive mode, pass a JVM property named 'driven.archive.path' when Cascading is invoked.

In case of Apache Hadoop, you can use the HADOOP_OPTS environment variable.

> export HADOOP_OPTS="$HADOOP_OPTS -Dcascading.management.document.service.archive.dir=/sompath/archive"

The compressed archive files will be located at the path set with a timestamp when the cascading job started. Do not use ~ expansion for the path.

Project and contact information: http://www.concurrentinc.com/