cascading.cascade
Class Cascade

java.lang.Object
  extended by cascading.cascade.Cascade
All Implemented Interfaces:
UnitOfWork<CascadeStats>

public class Cascade
extends Object
implements UnitOfWork<CascadeStats>

A Cascade is an assembly of Flow instances that share or depend on equivalent Tap instances and are executed as a single group. The most common case is where one Flow instance depends on a Tap created by a second Flow instance. This dependency chain can continue as practical.

Note Flow instances that have no shared dependencies will be executed in parallel.

Additionally, a Cascade allows for incremental builds of complex data processing processes. If a given source Tap is newer than a subsequent sink Tap in the assembly, the connecting Flow(s) will be executed when the Cascade executed. If all the targets (sinks) are up to date, the Cascade exits immediately and does nothing.

The concept of 'stale' is pluggable, see the FlowSkipStrategy class.

When a Cascade starts up, if first verifies which Flow instances have stale sinks, if the sinks are not stale, the method BaseFlow.deleteSinksIfNotUpdate() is called. Before appends/updates were supported (logically) the Cascade deleted all the sinks in a Flow.

The new consequence of this is if the Cascade fails, but does complete a Flow that appended or updated data, re-running the Cascade (and the successful append/update Flow) will re-update data to the source. Some systems may be idempotent and may not have any side-effects. So plan accordingly.

Use the CascadeListener to receive any events on the life-cycle of the Cascade as it executes. Any Tap instances owned by managed Flows also implementing CascadeListener will automatically be added to the set of listeners.

See Also:
CascadeListener, Flow, FlowSkipStrategy

Nested Class Summary
protected  class Cascade.CascadeJob
          Class CascadeJob manages Flow execution in the current Cascade instance.
 
Constructor Summary
protected Cascade()
          for testing
 
Method Summary
 void addListener(CascadeListener flowListener)
           
 void cleanup()
           
 void complete()
          Method complete begins the current Cascade process if method start() was not previously called.
 List<Flow> findFlows(String regex)
          Method findFlows returns a List of flows whose names match the given regex pattern.
 Collection<Flow> findFlowsSinkingTo(String identifier)
          Method findFlowsSinkingTo returns all Flow instances that writes to a sink with the given identifier.
 Collection<Flow> findFlowsSourcingFrom(String identifier)
          Method findFlowsSourcingFrom returns all Flow instances that reads from a source with the given identifier.
protected  void fireOnStarting()
           
protected  void fireOnStopping()
           
 Collection<Tap> getAllTaps()
          Method getAllTaps returns all source, sink, and checkpoint Tap instances associated with the managed Flow instances in this Cascade instance.
 CascadeStats getCascadeStats()
          Method getCascadeStats returns the cascadeStats of this Cascade object.
 Collection<Tap> getCheckpointsTaps()
          Method getCheckpointTaps returns all checkpoint Tap instances from all the Flow instances in this Cascade instance.
protected  FlowGraph getFlowGraph()
           
 List<Flow> getFlows()
          Method getFlows returns the flows managed by this Cascade object.
 FlowSkipStrategy getFlowSkipStrategy()
          Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.
 Collection<Flow> getHeadFlows()
          Method getHeadFlows returns all Flow instances that are at the "head" of the flow graph.
 String getID()
          Method getID returns the ID of this Cascade object.
protected  IdentifierGraph getIdentifierGraph()
           
 Collection<Flow> getIntermediateFlows()
          Method getIntermediateFlows returns all Flow instances that are neither at the "tail" or "tail" of the flow graph.
 Collection<Tap> getIntermediateTaps()
          Method getIntermediateTaps returns all Tap instances that are neither at the source or sink of the flow graph.
 String getName()
          Method getName returns the name of this Cascade object.
 Collection<Flow> getPredecessorFlows(Flow flow)
          Method getPredecessorFlows returns a Collection of all the Flow instances that will be executed before the given Flow instance.
 Collection<Tap> getSinkTaps()
          Method getSinkTaps returns all sink Tap instances in this Cascade instance.
 Collection<Tap> getSourceTaps()
          Method getSourceTaps returns all source Tap instances in this Cascade instance.
 UnitOfWorkSpawnStrategy getSpawnStrategy()
           
 CascadeStats getStats()
           
 Collection<Flow> getSuccessorFlows(Flow flow)
          Method getSuccessorFlows returns a Collection of all the Flow instances that will be executed after the given Flow instance.
 String getTags()
          Method getTags returns the tags associated with this Cascade object.
 Collection<Flow> getTailFlows()
          Method getTailFlows returns all Flow instances that are at the "tail" of the flow graph.
protected  TapGraph getTapGraph()
           
 boolean hasListeners()
           
 void prepare()
           
protected  void printElementGraph(String filename, org.jgrapht.graph.SimpleDirectedGraph<String,BaseFlow.FlowHolder> graph)
           
 boolean removeListener(CascadeListener flowListener)
           
 FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
          Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy, if any, is returned.
 void setSpawnStrategy(UnitOfWorkSpawnStrategy spawnStrategy)
           
 void start()
          Method start begins the current Cascade process.
 void stop()
           
 String toString()
           
 void writeDOT(String filename)
          Method writeDOT writes this element graph to a DOT file for easy visualization and debugging.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Cascade

protected Cascade()
for testing

Method Detail

getName

public String getName()
Method getName returns the name of this Cascade object.

Specified by:
getName in interface UnitOfWork<CascadeStats>
Returns:
the name (type String) of this Cascade object.

getID

public String getID()
Method getID returns the ID of this Cascade object.

The ID value is a long HEX String used to identify this instance globally. Subsequent Cascade instances created with identical parameters will not return the same ID.

Specified by:
getID in interface UnitOfWork<CascadeStats>
Returns:
the ID (type String) of this Cascade object.

getTags

public String getTags()
Method getTags returns the tags associated with this Cascade object.

Specified by:
getTags in interface UnitOfWork<CascadeStats>
Returns:
the tags (type String) of this Cascade object.

hasListeners

public boolean hasListeners()

addListener

public void addListener(CascadeListener flowListener)

removeListener

public boolean removeListener(CascadeListener flowListener)

fireOnStopping

protected void fireOnStopping()

fireOnStarting

protected void fireOnStarting()

getCascadeStats

public CascadeStats getCascadeStats()
Method getCascadeStats returns the cascadeStats of this Cascade object.

Returns:
the cascadeStats (type CascadeStats) of this Cascade object.

getStats

public CascadeStats getStats()
Specified by:
getStats in interface UnitOfWork<CascadeStats>

getFlowGraph

protected FlowGraph getFlowGraph()

getIdentifierGraph

protected IdentifierGraph getIdentifierGraph()

getFlows

public List<Flow> getFlows()
Method getFlows returns the flows managed by this Cascade object. The returned Flow instances will be in topological order.

Returns:
the flows (type Collection) of this Cascade object.

findFlows

public List<Flow> findFlows(String regex)
Method findFlows returns a List of flows whose names match the given regex pattern.

Parameters:
regex - of type String
Returns:
List

getHeadFlows

public Collection<Flow> getHeadFlows()
Method getHeadFlows returns all Flow instances that are at the "head" of the flow graph.

That is, they are the first to execute and have no Tap source dependencies with Flow instances in the this Cascade instance.

Returns:
Collection

getTailFlows

public Collection<Flow> getTailFlows()
Method getTailFlows returns all Flow instances that are at the "tail" of the flow graph.

That is, they are the last to execute and have no Tap sink dependencies with Flow instances in the this Cascade instance.

Returns:
Collection

getIntermediateFlows

public Collection<Flow> getIntermediateFlows()
Method getIntermediateFlows returns all Flow instances that are neither at the "tail" or "tail" of the flow graph.

Returns:
Collection

getTapGraph

protected TapGraph getTapGraph()

getSourceTaps

public Collection<Tap> getSourceTaps()
Method getSourceTaps returns all source Tap instances in this Cascade instance.

That is, none of returned Tap instances are the sinks of other Flow instances in this Cascade.

All CompositeTap instances are unwound if addressed directly by a managed Flow instance.

Returns:
Collection

getSinkTaps

public Collection<Tap> getSinkTaps()
Method getSinkTaps returns all sink Tap instances in this Cascade instance.

That is, none of returned Tap instances are the sources of other Flow instances in this Cascade.

All CompositeTap instances are unwound if addressed directly by a managed Flow instance.

This method will return checkpoint Taps managed by Flow instances if not used as a source by other Flow instances.

Returns:
Collection

getCheckpointsTaps

public Collection<Tap> getCheckpointsTaps()
Method getCheckpointTaps returns all checkpoint Tap instances from all the Flow instances in this Cascade instance.

Returns:
Collection

getIntermediateTaps

public Collection<Tap> getIntermediateTaps()
Method getIntermediateTaps returns all Tap instances that are neither at the source or sink of the flow graph.

This method does consider checkpoint Taps managed by Flow instances in this Cascade instance.

Returns:
Collection

getAllTaps

public Collection<Tap> getAllTaps()
Method getAllTaps returns all source, sink, and checkpoint Tap instances associated with the managed Flow instances in this Cascade instance.

Returns:
Collection

getSuccessorFlows

public Collection<Flow> getSuccessorFlows(Flow flow)
Method getSuccessorFlows returns a Collection of all the Flow instances that will be executed after the given Flow instance.

Parameters:
flow - of type Flow
Returns:
Collection

getPredecessorFlows

public Collection<Flow> getPredecessorFlows(Flow flow)
Method getPredecessorFlows returns a Collection of all the Flow instances that will be executed before the given Flow instance.

Parameters:
flow - of type Flow
Returns:
Collection

findFlowsSourcingFrom

public Collection<Flow> findFlowsSourcingFrom(String identifier)
Method findFlowsSourcingFrom returns all Flow instances that reads from a source with the given identifier.

Parameters:
identifier - of type String
Returns:
Collection

findFlowsSinkingTo

public Collection<Flow> findFlowsSinkingTo(String identifier)
Method findFlowsSinkingTo returns all Flow instances that writes to a sink with the given identifier.

Parameters:
identifier - of type String
Returns:
Collection

getFlowSkipStrategy

public FlowSkipStrategy getFlowSkipStrategy()
Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.

Returns:
FlowSkipStrategy

setFlowSkipStrategy

public FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy, if any, is returned. If a strategy is given, it will be used as the strategy for all BaseFlow instances managed by this Cascade instance. To revert back to consulting the strategies associated with each Flow instance, re-set this value to null, its default value.

FlowSkipStrategy instances define when a Flow instance should be skipped. The default strategy is FlowSkipIfSinkNotStale and is inherited from the Flow instance in question. An alternative strategy would be FlowSkipIfSinkExists.

A FlowSkipStrategy will not be consulted when executing a Flow directly through start()

Parameters:
flowSkipStrategy - of type FlowSkipStrategy
Returns:
FlowSkipStrategy

prepare

public void prepare()
Specified by:
prepare in interface UnitOfWork<CascadeStats>

start

public void start()
Method start begins the current Cascade process. It returns immediately. See method complete() to block until the Cascade completes.

Specified by:
start in interface UnitOfWork<CascadeStats>

complete

public void complete()
Method complete begins the current Cascade process if method start() was not previously called. This method blocks until the process completes.

Specified by:
complete in interface UnitOfWork<CascadeStats>
Throws:
RuntimeException - wrapping any exception thrown internally.

stop

public void stop()
Specified by:
stop in interface UnitOfWork<CascadeStats>

cleanup

public void cleanup()
Specified by:
cleanup in interface UnitOfWork<CascadeStats>

writeDOT

public void writeDOT(String filename)
Method writeDOT writes this element graph to a DOT file for easy visualization and debugging.

Parameters:
filename - of type String

printElementGraph

protected void printElementGraph(String filename,
                                 org.jgrapht.graph.SimpleDirectedGraph<String,BaseFlow.FlowHolder> graph)

toString

public String toString()
Overrides:
toString in class Object

getSpawnStrategy

public UnitOfWorkSpawnStrategy getSpawnStrategy()
Specified by:
getSpawnStrategy in interface UnitOfWork<CascadeStats>

setSpawnStrategy

public void setSpawnStrategy(UnitOfWorkSpawnStrategy spawnStrategy)
Specified by:
setSpawnStrategy in interface UnitOfWork<CascadeStats>


Copyright © 2007-2015 Concurrent, Inc. All Rights Reserved.