9.7 Java Expression Operations

Cascading provides some support for dynamically-compiled Java expressions to be used in either Functions or Filters. This capability is provided by the Janino embedded Java compiler, which compiles the expressions into byte code for optimal processing speed. Janino is documented in detail on its website, http://www.janino.net/.

This capability allows an Operation to evaluate a suitable one-line Java expression, such as a + 3 * 2 or a < 7, where the variable values ( a and b) are passed in as Tuple fields. The result of the Operation thus depends on the evaluated result of the expression - in the first example, some number, and in the second, a Boolean value.

ExpressionFunction

The function cascading.operation.expression.ExpressionFunction dynamically composes a string expression when executed, assigning argument Tuple values to variables in the expression.

// incoming -> "ip", "time", "method", "event", "status", "size"

String exp =
  "\"this \" + method + \" request was \" + size + \" bytes\"";
Fields fields = new Fields( "pretty" );
ExpressionFunction function =
  new ExpressionFunction( fields, exp, String.class );

assembly =
  new Each( assembly, new Fields( "method", "size" ), function );

// outgoing -> "pretty" = "this GET request was 1282652 bytes"

Above, we create a new String value that contains an expression containing values from the current Tuple. Note that you must declare the type for every input Tuple field so that the expression compiler knows how to treat the variables in the expression.

ExpressionFilter

The filter cascading.operation.expression.ExpressionFilter evaluates a Boolean expression, assigning argument Tuple values to variables in the expression. If the expression returns true, the Tuple is removed from the stream.

// incoming -> "ip", "time", "method", "event", "status", "size"

ExpressionFilter filter =
  new ExpressionFilter( "status != 200", Integer.TYPE );

assembly = new Each( assembly, new Fields( "status" ), filter );

// outgoing -> "ip", "time", "method", "event", "status", "size"

In this example, every line in the Apache log that does not have a status of "200" is filtered out. ExpressionFilter coerces the value into the specified type if necessary to make the comparison - in this case, coercing the status String into an int.

As of Cascading 2.2, along with cascading.operation.expression.ExpressionFilter and cascading.operation.expression.ExpressionFunction, two new operations have been added to support multi-line Java code, cascading.operation.expression.ScriptFilter and cascading.operation.expression.ScriptFunction. See the relevant Javadoc for details on usage.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.