cascading.flow
Interface Flow<Config>

All Superinterfaces:
UnitOfWork<FlowStats>
All Known Implementing Classes:
BaseFlow

public interface Flow<Config>
extends UnitOfWork<FlowStats>

A Flow is a logical unit of work declared by an assembly of Pipe instances connected to source and sink Tap instances.

A Flow is then executed to push the incoming source data through the assembly into one or more sinks.

A Flow sub-class instance may not be instantiated directly in most cases, see sub-classes of FlowConnector class for supported platforms.

Note that Pipe assemblies can be reused in multiple Flow instances. They maintain no state regarding the Flow execution. Subsequently, Pipe assemblies can be given parameters through its calling Flow so they can be built in a generic fashion.

When a Flow is created, an optimized internal representation is created that is then executed on the underlying execution platform. This is typically done by creating one or more FlowStep instances.

Flows are submitted in order of dependency when used with a Cascade. If two or more steps do not share the same dependencies and all can be scheduled simultaneously, the getSubmitPriority() value determines the order in which all steps will be submitted for execution. The default submit priority is 5.

Use the FlowListener to receive any events on the life-cycle of the Flow as it executes. Any Tap instances owned by the Flow also implementing FlowListener will automatically be added to the set of listeners.

See Also:
FlowListener, FlowConnector

Field Summary
static String CASCADING_FLOW_ID
           
 
Method Summary
 void addListener(FlowListener flowListener)
          Method addListener registers the given flowListener with this instance.
 void addStepListener(FlowStepListener flowStepListener)
          Method addStepListener registers the given flowStepListener with this instance.
 boolean areSinksStale()
          Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources.
 boolean areSourcesNewer(long sinkModified)
          Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.
 void cleanup()
           
 void complete()
          Method complete starts the current Flow instance if it has not be previously started, then block until completion.
 String getCascadeID()
           
 List<String> getCheckpointNames()
           
 Map<String,Tap> getCheckpoints()
          Method getCheckpoints returns the checkpoint taps of this Flow object.
 Collection<Tap> getCheckpointsCollection()
          Method getCheckpointsCollection returns a Collection of checkpoint Taps for this Flow object.
 Config getConfig()
          Method getConfig returns the internal configuration object.
 Map<Object,Object> getConfigAsProperties()
          Method getConfiAsProperties converts the internal configuration object into a Map of key value pairs.
 Config getConfigCopy()
          Method getConfigCopy returns a copy of the internal configuration object.
 Map<String,String> getFlowDescriptor()
          Returns an immutable map of properties giving more details about the Flow object.
 FlowProcess<Config> getFlowProcess()
           
 FlowSkipStrategy getFlowSkipStrategy()
          Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.
 FlowStats getFlowStats()
          Method getFlowStats returns the flowStats of this Flow object.
 List<FlowStep<Config>> getFlowSteps()
          Method getFlowSteps returns the flowSteps of this Flow object.
 FlowStepStrategy getFlowStepStrategy()
          Returns the current FlowStepStrategy instance.
 String getID()
          Method getID returns the ID of this Flow object.
 String getName()
          Method getName returns the name of this Flow object.
 PlatformInfo getPlatformInfo()
           
 String getProperty(String key)
           
 String getRunID()
           
 Tap getSink()
          Method getSink returns the first sink of this Flow object.
 Tap getSink(String name)
           
 long getSinkModified()
          Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.
 List<String> getSinkNames()
           
 Map<String,Tap> getSinks()
          Method getSinks returns the sinks of this Flow object.
 Collection<Tap> getSinksCollection()
          Method getSinksCollection returns a Collection of sink Taps for this Flow object.
 Tap getSource(String name)
           
 List<String> getSourceNames()
           
 Map<String,Tap> getSources()
          Method getSources returns the sources of this Flow object.
 Collection<Tap> getSourcesCollection()
          Method getSourcesCollection returns a Collection of source Taps for this Flow object.
 int getSubmitPriority()
          Method getSubmitPriority returns the submitPriority of this Flow object.
 String getTags()
           
 List<String> getTrapNames()
           
 Map<String,Tap> getTraps()
          Method getTraps returns the traps of this Flow object.
 Collection<Tap> getTrapsCollection()
          Method getTrapsCollection returns a Collection of trap Taps for this Flow object.
 boolean hasListeners()
          Method hasListeners returns true if FlowListener instances have been registered.
 boolean hasStepListeners()
          Method hasStepListeners returns true if FlowStepListener instances have been registered with any of the FlowSteps belonging to this instance
 boolean isSkipFlow()
          Method isSkipFlow returns true if the parent Cascade should skip this Flow instance.
 boolean isStopJobsOnExit()
          Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object.
 TupleEntryIterator openSink()
          Method openSink opens the first sink Tap.
 TupleEntryIterator openSink(String name)
          Method openSink opens the named sink Tap.
 TupleEntryIterator openSource()
          Method openSource opens the first source Tap.
 TupleEntryIterator openSource(String name)
          Method openSource opens the named source Tap.
 TupleEntryIterator openTapForRead(Tap tap)
          Method openTapForRead return a TupleEntryIterator for the given Tap instance.
 TupleEntryCollector openTapForWrite(Tap tap)
          Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.
 TupleEntryIterator openTrap()
          Method openTrap opens the first trap Tap.
 TupleEntryIterator openTrap(String name)
          Method openTrap opens the named trap Tap.
 void prepare()
          Method prepare is used by a Cascade to notify the given Flow it should initialize or clear any resources necessary for start() to be called successfully.
 boolean removeListener(FlowListener flowListener)
          Method removeListener removes the given flowListener from this instance.
 boolean removeStepListener(FlowStepListener flowStepListener)
          Method removeStepListener removes the given flowStepListener from this instance.
 boolean resourceExists(Tap tap)
          Method resourceExists returns true if the resource represented by the given Tap instance exists.
 FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
          Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy is returned.
 void setFlowStepStrategy(FlowStepStrategy flowStepStrategy)
          Sets a default FlowStepStrategy instance.
 void setSubmitPriority(int submitPriority)
          Method setSubmitPriority sets the submitPriority of this Flow object.
 void start()
          Method start begins the execution of this Flow instance.
 boolean stepsAreLocal()
          Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.
 void stop()
          Method stop stops all running jobs, killing any currently executing.
 void writeDOT(String filename)
          Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.
 void writeStepsDOT(String filename)
          Method writeStepsDOT writes this Flow step graph to the given filename as a DOT file for import into a graphics package.
 
Methods inherited from interface cascading.management.UnitOfWork
getSpawnStrategy, getStats, setSpawnStrategy
 

Field Detail

CASCADING_FLOW_ID

static final String CASCADING_FLOW_ID
See Also:
Constant Field Values
Method Detail

getName

String getName()
Method getName returns the name of this Flow object.

Specified by:
getName in interface UnitOfWork<FlowStats>
Returns:
the name (type String) of this Flow object.

prepare

void prepare()
Method prepare is used by a Cascade to notify the given Flow it should initialize or clear any resources necessary for start() to be called successfully.

Specifically, this implementation calls BaseFlow.deleteSinksIfNotUpdate() && BaseFlow.deleteTrapsIfNotUpdate().

Specified by:
prepare in interface UnitOfWork<FlowStats>
Throws:
IOException - when

start

void start()
Method start begins the execution of this Flow instance. It will return immediately. Use the method complete() to block until this Flow completes.

Specified by:
start in interface UnitOfWork<FlowStats>

stop

void stop()
Method stop stops all running jobs, killing any currently executing.

Specified by:
stop in interface UnitOfWork<FlowStats>

complete

void complete()
Method complete starts the current Flow instance if it has not be previously started, then block until completion.

Specified by:
complete in interface UnitOfWork<FlowStats>

cleanup

void cleanup()
Specified by:
cleanup in interface UnitOfWork<FlowStats>

getConfig

Config getConfig()
Method getConfig returns the internal configuration object.

Any changes to this object will not be reflected in child steps. See FlowConnector for setting default properties visible to children. Or see FlowStepStrategy for setting properties on individual steps before they are executed.

Returns:
the default configuration of this Flow

getConfigCopy

Config getConfigCopy()
Method getConfigCopy returns a copy of the internal configuration object. This object can be safely modified.

Returns:
a copy of the default configuration of this Flow

getConfigAsProperties

Map<Object,Object> getConfigAsProperties()
Method getConfiAsProperties converts the internal configuration object into a Map of key value pairs.

Returns:
a Map of key/value pairs

getProperty

String getProperty(String key)

getID

String getID()
Method getID returns the ID of this Flow object.

The ID value is a long HEX String used to identify this instance globally. Subsequent Flow instances created with identical parameters will not return the same ID.

Specified by:
getID in interface UnitOfWork<FlowStats>
Returns:
the ID (type String) of this Flow object.

getFlowDescriptor

Map<String,String> getFlowDescriptor()
Returns an immutable map of properties giving more details about the Flow object.

See FlowDef.addDescription(String, String) to set values on a given Flow.

Flow descriptions provide meta-data to monitoring systems describing the workload a given Flow represents. For known description types, see FlowDescriptors.

Returns:
Map

getTags

String getTags()
Specified by:
getTags in interface UnitOfWork<FlowStats>

getSubmitPriority

int getSubmitPriority()
Method getSubmitPriority returns the submitPriority of this Flow object.

10 is lowest, 1 is the highest, 5 is the default.

Returns:
the submitPriority (type int) of this FlowStep object.

setSubmitPriority

void setSubmitPriority(int submitPriority)
Method setSubmitPriority sets the submitPriority of this Flow object.

10 is lowest, 1 is the highest, 5 is the default.

Parameters:
submitPriority - the submitPriority of this FlowStep object.

getFlowProcess

FlowProcess<Config> getFlowProcess()

getFlowStats

FlowStats getFlowStats()
Method getFlowStats returns the flowStats of this Flow object.

Returns:
the flowStats (type FlowStats) of this Flow object.

hasListeners

boolean hasListeners()
Method hasListeners returns true if FlowListener instances have been registered.

Returns:
boolean

addListener

void addListener(FlowListener flowListener)
Method addListener registers the given flowListener with this instance.

Parameters:
flowListener - of type FlowListener

removeListener

boolean removeListener(FlowListener flowListener)
Method removeListener removes the given flowListener from this instance.

Parameters:
flowListener - of type FlowListener
Returns:
true if the listener was removed

hasStepListeners

boolean hasStepListeners()
Method hasStepListeners returns true if FlowStepListener instances have been registered with any of the FlowSteps belonging to this instance

Returns:
boolean

addStepListener

void addStepListener(FlowStepListener flowStepListener)
Method addStepListener registers the given flowStepListener with this instance.

Parameters:
flowStepListener - of type addStepListener

removeStepListener

boolean removeStepListener(FlowStepListener flowStepListener)
Method removeStepListener removes the given flowStepListener from this instance.

Parameters:
flowStepListener - of type FlowStepListener
Returns:
true if the listener was removed from all the FlowStep belonging to this instance

getSources

Map<String,Tap> getSources()
Method getSources returns the sources of this Flow object.

Returns:
the sources (type Map) of this Flow object.

getSourceNames

List<String> getSourceNames()

getSource

Tap getSource(String name)

getSourcesCollection

Collection<Tap> getSourcesCollection()
Method getSourcesCollection returns a Collection of source Taps for this Flow object.

Returns:
the sourcesCollection (type Collection) of this Flow object.

getSinks

Map<String,Tap> getSinks()
Method getSinks returns the sinks of this Flow object.

Returns:
the sinks (type Map) of this Flow object.

getSinkNames

List<String> getSinkNames()

getSink

Tap getSink(String name)

getSinksCollection

Collection<Tap> getSinksCollection()
Method getSinksCollection returns a Collection of sink Taps for this Flow object.

Returns:
the sinkCollection (type Collection) of this Flow object.

getSink

Tap getSink()
Method getSink returns the first sink of this Flow object.

Returns:
the sink (type Tap) of this Flow object.

getTraps

Map<String,Tap> getTraps()
Method getTraps returns the traps of this Flow object.

Returns:
the traps (type Map) of this Flow object.

getTrapNames

List<String> getTrapNames()

getTrapsCollection

Collection<Tap> getTrapsCollection()
Method getTrapsCollection returns a Collection of trap Taps for this Flow object.

Returns:
the trapsCollection (type Collection) of this Flow object.

getCheckpoints

Map<String,Tap> getCheckpoints()
Method getCheckpoints returns the checkpoint taps of this Flow object.

Returns:
the traps (type Map) of this Flow object.

getCheckpointNames

List<String> getCheckpointNames()

getCheckpointsCollection

Collection<Tap> getCheckpointsCollection()
Method getCheckpointsCollection returns a Collection of checkpoint Taps for this Flow object.

Returns:
the trapsCollection (type Collection) of this Flow object.

getFlowSkipStrategy

FlowSkipStrategy getFlowSkipStrategy()
Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.

Returns:
FlowSkipStrategy

setFlowSkipStrategy

FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy is returned.

FlowSkipStrategy instances define when a Flow instance should be skipped. The default strategy is FlowSkipIfSinkNotStale. An alternative strategy would be FlowSkipIfSinkExists.

A FlowSkipStrategy will not be consulted when executing a Flow directly through start() or complete(). Only when the Flow is executed through a Cascade instance.

Parameters:
flowSkipStrategy - of type FlowSkipStrategy
Returns:
FlowSkipStrategy

isSkipFlow

boolean isSkipFlow()
                   throws IOException
Method isSkipFlow returns true if the parent Cascade should skip this Flow instance. True is returned if the current FlowSkipStrategy returns true.

Returns:
the skipFlow (type boolean) of this Flow object.
Throws:
IOException - when

areSinksStale

boolean areSinksStale()
                      throws IOException
Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources. Or if any sink method Tap.isReplace() returns true.

Returns:
boolean
Throws:
IOException - when

areSourcesNewer

boolean areSourcesNewer(long sinkModified)
                        throws IOException
Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.

Parameters:
sinkModified - of type long
Returns:
boolean
Throws:
IOException - when

getSinkModified

long getSinkModified()
                     throws IOException
Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.

If zero (0) is returned, at least one of the sink resources does not exist. If minus one (-1) is returned, atleast one of the sinks are marked for delete (returns true).

Returns:
the sinkModified (type long) of this Flow object.
Throws:
IOException - when

getFlowStepStrategy

FlowStepStrategy getFlowStepStrategy()
Returns the current FlowStepStrategy instance.

Returns:
FlowStepStrategy

setFlowStepStrategy

void setFlowStepStrategy(FlowStepStrategy flowStepStrategy)
Sets a default FlowStepStrategy instance.

Use a FlowStepStrategy to change FlowStep configuration properties before the properties are submitted to the underlying platform for the step unit of work.

Parameters:
flowStepStrategy - The FlowStepStrategy to use.

getFlowSteps

List<FlowStep<Config>> getFlowSteps()
Method getFlowSteps returns the flowSteps of this Flow object. They will be in topological order.

Returns:
the steps (type List) of this Flow object.

openSource

TupleEntryIterator openSource()
                              throws IOException
Method openSource opens the first source Tap.

Returns:
TupleIterator
Throws:
IOException - when

openSource

TupleEntryIterator openSource(String name)
                              throws IOException
Method openSource opens the named source Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

openSink

TupleEntryIterator openSink()
                            throws IOException
Method openSink opens the first sink Tap.

Returns:
TupleIterator
Throws:
IOException - when

openSink

TupleEntryIterator openSink(String name)
                            throws IOException
Method openSink opens the named sink Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

openTrap

TupleEntryIterator openTrap()
                            throws IOException
Method openTrap opens the first trap Tap.

Returns:
TupleIterator
Throws:
IOException - when

openTrap

TupleEntryIterator openTrap(String name)
                            throws IOException
Method openTrap opens the named trap Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

resourceExists

boolean resourceExists(Tap tap)
                       throws IOException
Method resourceExists returns true if the resource represented by the given Tap instance exists.

Parameters:
tap - of type Tap
Returns:
boolean
Throws:
IOException - when

openTapForRead

TupleEntryIterator openTapForRead(Tap tap)
                                  throws IOException
Method openTapForRead return a TupleEntryIterator for the given Tap instance.

Note the returned iterator will return the same instance of TupleEntry on every call, thus a copy must be made of either the TupleEntry or the underlying Tuple instance if they are to be stored in a Collection.

Parameters:
tap - of type Tap
Returns:
TupleIterator
Throws:
IOException - when there is an error opening the resource

openTapForWrite

TupleEntryCollector openTapForWrite(Tap tap)
                                    throws IOException
Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.

Parameters:
tap - of type Tap
Returns:
TupleCollector
Throws:
IOException - when there is an error opening the resource

writeDOT

void writeDOT(String filename)
Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.

Parameters:
filename - of type String

writeStepsDOT

void writeStepsDOT(String filename)
Method writeStepsDOT writes this Flow step graph to the given filename as a DOT file for import into a graphics package.

Parameters:
filename - of type String

getCascadeID

String getCascadeID()

getRunID

String getRunID()

getPlatformInfo

PlatformInfo getPlatformInfo()

stepsAreLocal

boolean stepsAreLocal()
Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.

Returns:
boolean

isStopJobsOnExit

boolean isStopJobsOnExit()
Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object. Defaults to true.

Returns:
the stopJobsOnExit (type boolean) of this Flow object.


Copyright © 2007-2015 Concurrent, Inc. All Rights Reserved.