public abstract class Tap<Config,Input,Output> extends java.lang.Object implements ScopedElement, FlowElement, java.io.Serializable, Traceable
Flow
.
That is, a source Tap is the head end of a connected Pipe
and Tuple
stream, and
a sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk,
distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts
out the complexity of connecting to these types of data sources.
A Tap takes a Scheme
instance, which is used to identify the type of resource (text file, binary file, etc).
A Tap is responsible for how the resource is reached.
By default when planning a Flow, Tap equality is a function of the getIdentifier()
and getScheme()
values. That is, two Tap instances are the same Tap instance if they sink/source the same resource and sink/source
the same fields.
Some more advanced taps, like a database tap, may need to extend equality to include any filtering, like the
where
clause in a SQL statement so two taps reading from the same SQL table aren't considered equal.
Taps are also used to determine dependencies between two or more Flow
instances when used with a
Cascade
. In that case the getFullIdentifier(Object)
value is used and the Scheme
is ignored.Modifier | Constructor and Description |
---|---|
protected |
Tap() |
protected |
Tap(Scheme<Config,Input,Output,?,?> scheme) |
protected |
Tap(Scheme<Config,Input,Output,?,?> scheme,
SinkMode sinkMode) |
Modifier and Type | Method and Description |
---|---|
boolean |
commitResource(Config conf)
Method commitResource allows the underlying resource to be notified when all write processing is
successful so that any additional cleanup or processing may be completed.
|
abstract boolean |
createResource(Config conf)
Method createResource creates the underlying resource.
|
boolean |
createResource(FlowProcess<? extends Config> flowProcess)
Method createResource creates the underlying resource.
|
abstract boolean |
deleteResource(Config conf)
Method deleteResource deletes the resource represented by this instance.
|
boolean |
deleteResource(FlowProcess<? extends Config> flowProcess)
Method deleteResource deletes the resource represented by this instance.
|
boolean |
equals(java.lang.Object object) |
void |
flowConfInit(Flow<Config> flow)
Method flowInit allows this Tap instance to initialize itself in context of the given
Flow instance. |
ConfigDef |
getConfigDef()
Returns a
ConfigDef instance that allows for local properties to be set and made available via
a resulting FlowProcess instance when the tap is invoked. |
java.lang.String |
getFullIdentifier(Config conf)
Method getFullIdentifier returns a fully qualified resource identifier.
|
java.lang.String |
getFullIdentifier(FlowProcess<? extends Config> flowProcess)
Method getFullIdentifier returns a fully qualified resource identifier.
|
abstract java.lang.String |
getIdentifier()
Method getIdentifier returns a String representing the resource this Tap instance represents.
|
abstract long |
getModifiedTime(Config conf)
Method getModifiedTime returns the date this resource was last modified.
|
long |
getModifiedTime(FlowProcess<? extends Config> flowProcess)
Method getModifiedTime returns the date this resource was last modified.
|
ConfigDef |
getNodeConfigDef()
Returns a
ConfigDef instance that allows for process level properties to be set and made available via
a resulting FlowProcess instance when the tap is invoked. |
Scheme<Config,Input,Output,?,?> |
getScheme()
Method getScheme returns the scheme of this Tap object.
|
Fields |
getSinkFields()
Method getSinkFields returns the sinkFields of this Tap object.
|
SinkMode |
getSinkMode()
Method getSinkMode returns the
SinkMode }of this Tap object. |
Fields |
getSourceFields()
Method getSourceFields returns the sourceFields of this Tap object.
|
ConfigDef |
getStepConfigDef()
Returns a
ConfigDef instance that allows for process level properties to be set and made available via
a resulting FlowProcess instance when the tap is invoked. |
java.lang.String |
getTrace()
Method getTrace returns a String that pinpoints the caller that created this instance.
|
boolean |
hasConfigDef()
Returns
true if there are properties in the configDef instance. |
int |
hashCode() |
boolean |
hasNodeConfigDef()
Returns
true if there are properties in the nodeConfigDef instance. |
boolean |
hasStepConfigDef()
Returns
true if there are properties in the stepConfigDef instance. |
static java.lang.String |
id(Tap tap)
Creates and returns a unique ID for the given Tap, this value is cached and may be used to uniquely identify
the Tap instance in properties files etc.
|
boolean |
isKeep()
Method isKeep indicates whether the resource represented by this instance should be kept if it
already exists when the Flow is started.
|
boolean |
isReplace()
Method isReplace indicates whether the resource represented by this instance should be deleted if it
already exists when the Flow is started.
|
boolean |
isSink()
Method isSink returns true if this Tap instance can be used as a sink.
|
boolean |
isSource()
Method isSource returns true if this Tap instance can be used as a source.
|
boolean |
isTemporary()
Method isTemporary returns true if this Tap is temporary (used for intermediate results).
|
boolean |
isUpdate()
Method isUpdate indicates whether the resource represented by this instance should be updated if it already
exists.
|
TupleEntryIterator |
openForRead(FlowProcess<? extends Config> flowProcess)
Method openForRead opens the resource represented by this Tap instance for reading.
|
abstract TupleEntryIterator |
openForRead(FlowProcess<? extends Config> flowProcess,
Input input)
Method openForRead opens the resource represented by this Tap instance for reading.
|
TupleEntryCollector |
openForWrite(FlowProcess<? extends Config> flowProcess)
Method openForWrite opens the resource represented by this Tap instance for writing.
|
abstract TupleEntryCollector |
openForWrite(FlowProcess<? extends Config> flowProcess,
Output output)
Method openForWrite opens the resource represented by this Tap instance for writing.
|
Scope |
outgoingScopeFor(java.util.Set<Scope> incomingScopes)
Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement.
|
boolean |
prepareResourceForRead(Config conf)
Method prepareResourceForRead allows the underlying resource to be notified when reading will begin.
|
boolean |
prepareResourceForWrite(Config conf)
Method prepareResourceForWrite allows the underlying resource to be notified when writing will begin.
|
void |
presentSinkFields(FlowProcess<? extends Config> flowProcess,
Fields fields) |
void |
presentSourceFields(FlowProcess<? extends Config> flowProcess,
Fields fields) |
Fields |
resolveIncomingOperationArgumentFields(Scope incomingScope)
Method resolveIncomingOperationArgumentFields returns the Fields outgoing from the previous FlowElement that
are consumable by this FlowElement when preparing Operation arguments.
|
Fields |
resolveIncomingOperationPassThroughFields(Scope incomingScope)
Method resolveIncomingOperationPassThroughFields returns the Fields outgoing from the previous FlowElement that
are consumable by this FlowElement when preparing the Pipe outgoing tuple.
|
abstract boolean |
resourceExists(Config conf)
Method resourceExists returns true if the path represented by this instance exists.
|
boolean |
resourceExists(FlowProcess<? extends Config> flowProcess)
Method resourceExists returns true if the path represented by this instance exists.
|
Fields |
retrieveSinkFields(FlowProcess<? extends Config> flowProcess)
A hook for allowing a Scheme to lazily retrieve its sink fields.
|
Fields |
retrieveSourceFields(FlowProcess<? extends Config> flowProcess)
A hook for allowing a Scheme to lazily retrieve its source fields.
|
boolean |
rollbackResource(Config conf)
Method rollbackResource allows the underlying resource to be notified when any write processing has failed or
was stopped so that any cleanup may be started.
|
protected void |
setScheme(Scheme<Config,Input,Output,?,?> scheme) |
void |
sinkConfInit(FlowProcess<? extends Config> flowProcess,
Config conf)
Method sinkConfInit initializes this instance as a sink.
|
void |
sourceConfInit(FlowProcess<? extends Config> flowProcess,
Config conf)
Method sourceConfInit initializes this instance as a source.
|
static Tap[] |
taps(Tap... taps)
Convenience function to make an array of Tap instances.
|
java.lang.String |
toString() |
protected Tap()
public static Tap[] taps(Tap... taps)
taps
- of type Tappublic static java.lang.String id(Tap tap)
tap
- of type Tappublic Scheme<Config,Input,Output,?,?> getScheme()
public java.lang.String getTrace()
Traceable
public void flowConfInit(Flow<Config> flow)
Flow
instance.
This method is guaranteed to be called before the Flow is started and the
FlowListener.onStarting(cascading.flow.Flow)
event is fired.
This method will be called once per Flow, and before sourceConfInit(cascading.flow.FlowProcess, Object)
and
sinkConfInit(cascading.flow.FlowProcess, Object)
methods.flow
- of type Flowpublic void sourceConfInit(FlowProcess<? extends Config> flowProcess, Config conf)
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
Note that no resources or services should be modified by this method.flowProcess
- of type FlowProcessconf
- of type Configpublic void sinkConfInit(FlowProcess<? extends Config> flowProcess, Config conf)
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
Note that no resources or services should be modified by this method. If this Tap instance returns true for
isReplace()
, then deleteResource(Object)
will be called by the parent Flow.flowProcess
- of type FlowProcessconf
- of type Configpublic abstract java.lang.String getIdentifier()
public Fields getSourceFields()
public Fields getSinkFields()
public abstract TupleEntryIterator openForRead(FlowProcess<? extends Config> flowProcess, Input input) throws java.io.IOException
input
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
input type and instantiate it before calling super.openForRead()
.
Note the returned iterator will return the same instance of TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.flowProcess
- of type FlowProcessinput
- of type Inputjava.io.IOException
- when the resource cannot be openedpublic TupleEntryIterator openForRead(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.flowProcess
- of type FlowProcessjava.io.IOException
- when the resource cannot be openedpublic abstract TupleEntryCollector openForWrite(FlowProcess<? extends Config> flowProcess, Output output) throws java.io.IOException
SinkMode
setting. If SinkMode is
SinkMode.REPLACE
, this call may fail. See openForWrite(cascading.flow.FlowProcess)
.
output
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
output type and instantiate it before calling super.openForWrite()
.flowProcess
- of type FlowProcessoutput
- of type Outputjava.io.IOException
- when the resource cannot be openedpublic TupleEntryCollector openForWrite(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
SinkMode.REPLACE
settings. That is, if
SinkMode is set to SinkMode.REPLACE
the underlying resource will be deleted.
Note if SinkMode.UPDATE
is set, the resource will not be deleted.flowProcess
- of type FlowProcessjava.io.IOException
- when the resource cannot be openedpublic Scope outgoingScopeFor(java.util.Set<Scope> incomingScopes)
ScopedElement
outgoingScopeFor
in interface ScopedElement
incomingScopes
- of type Setpublic Fields retrieveSourceFields(FlowProcess<? extends Config> flowProcess)
flowProcess
- of type FlowProcesspublic void presentSourceFields(FlowProcess<? extends Config> flowProcess, Fields fields)
public Fields retrieveSinkFields(FlowProcess<? extends Config> flowProcess)
flowProcess
- of type FlowProcesspublic void presentSinkFields(FlowProcess<? extends Config> flowProcess, Fields fields)
public Fields resolveIncomingOperationArgumentFields(Scope incomingScope)
ScopedElement
resolveIncomingOperationArgumentFields
in interface ScopedElement
incomingScope
- of type Scopepublic Fields resolveIncomingOperationPassThroughFields(Scope incomingScope)
ScopedElement
resolveIncomingOperationPassThroughFields
in interface ScopedElement
incomingScope
- of type Scopepublic java.lang.String getFullIdentifier(FlowProcess<? extends Config> flowProcess)
flowProcess
- of type FlowProcesspublic java.lang.String getFullIdentifier(Config conf)
conf
- of type Configpublic boolean createResource(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
flowProcess
- of type FlowProcessjava.io.IOException
- when there is an error making directoriespublic abstract boolean createResource(Config conf) throws java.io.IOException
conf
- of type Configjava.io.IOException
- when there is an error making directoriespublic boolean deleteResource(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
flowProcess
- of type FlowProcessjava.io.IOException
- when the resource cannot be deletedpublic abstract boolean deleteResource(Config conf) throws java.io.IOException
conf
- of type Configjava.io.IOException
- when the resource cannot be deletedpublic boolean prepareResourceForRead(Config conf) throws java.io.IOException
false
, an exception will be thrown halting the current Flow.
In most cases, resource initialization should happen in the openForRead(FlowProcess, Object)
method.
This allows for initialization of cluster side resources, like a JDBC driver used to read data from a database,
that cannot be passed client to cluster.conf
- of type Configjava.io.IOException
public boolean prepareResourceForWrite(Config conf) throws java.io.IOException
false
, an exception will be thrown halting the current Flow.
In most cases, resource initialization should happen in the openForWrite(FlowProcess, Object)
method.
This allows for initialization of cluster side resources, like a JDBC driver used to write data to a database,
that cannot be passed client to cluster.
In the above JDBC example, overriding this method will allow for testing for the existence of and/or creating
a remote table used by all individual cluster side tasks.conf
- of type Configjava.io.IOException
public boolean commitResource(Config conf) throws java.io.IOException
rollbackResource(Object)
to handle cleanup in the face of failures.
This method is invoked once client side and not in the cluster, if any.
If other sink Tap instance in a given Flow fail on commitResource after called on this instance,
rollbackResource will not be called.conf
- of type Configjava.io.IOException
public boolean rollbackResource(Config conf) throws java.io.IOException
commitResource(Object)
to handle cleanup when the write has successfully completed.
This method is invoked once client side and not in the cluster, if any.conf
- of type Configjava.io.IOException
public boolean resourceExists(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
flowProcess
- of type FlowProcessjava.io.IOException
- when the status cannot be determinedpublic abstract boolean resourceExists(Config conf) throws java.io.IOException
conf
- of type Configjava.io.IOException
- when the status cannot be determinedpublic long getModifiedTime(FlowProcess<? extends Config> flowProcess) throws java.io.IOException
flowProcess
- of type FlowProcessjava.io.IOException
public abstract long getModifiedTime(Config conf) throws java.io.IOException
conf
- of type Configjava.io.IOException
public SinkMode getSinkMode()
SinkMode
}of this Tap object.public boolean isKeep()
public boolean isReplace()
public boolean isUpdate()
createResource(Object)
, when the Flow is started.public boolean isSink()
public boolean isSource()
public boolean isTemporary()
public ConfigDef getConfigDef()
ConfigDef
instance that allows for local properties to be set and made available via
a resulting FlowProcess
instance when the tap is invoked.
Any properties set on the configDef will not show up in any Flow
or FlowStep
process
level configuration, but will override any of those values as seen by the current Tap instance method call where a
FlowProcess is provided except for the sourceConfInit(cascading.flow.FlowProcess, Object)
and
sinkConfInit(cascading.flow.FlowProcess, Object)
methods.
That is, the *confInit
methods are called before any ConfigDef is applied, so any values placed into
a ConfigDef instance will not be visible to them.getConfigDef
in interface ScopedElement
public boolean hasConfigDef()
true
if there are properties in the configDef instance.hasConfigDef
in interface ScopedElement
public ConfigDef getNodeConfigDef()
ConfigDef
instance that allows for process level properties to be set and made available via
a resulting FlowProcess
instance when the tap is invoked.
Any properties set on the nodeConfigDef will not show up in any Flow configuration, but will show up in
the current process FlowNode
(in Apache Tez the Vertex configuration). Any value set in the
nodeConfigDef will be overridden by the pipe local #getConfigDef
instance.
Use this method to tweak properties in the process node this tap instance is planned into.getNodeConfigDef
in interface ScopedElement
public boolean hasNodeConfigDef()
true
if there are properties in the nodeConfigDef instance.hasNodeConfigDef
in interface ScopedElement
public ConfigDef getStepConfigDef()
ConfigDef
instance that allows for process level properties to be set and made available via
a resulting FlowProcess
instance when the tap is invoked.
Any properties set on the stepConfigDef will not show up in any Flow configuration, but will show up in
the current process FlowStep
(in Hadoop the MapReduce jobconf). Any value set in the
stepConfigDef will be overridden by the tap local #getConfigDef
instance.
Use this method to tweak properties in the process step this tap instance is planned into.
Note the *confInit
methods are called before any ConfigDef is applied, so any values placed into
a ConfigDef instance will not be visible to them.getStepConfigDef
in interface ScopedElement
public boolean hasStepConfigDef()
true
if there are properties in the stepConfigDef instance.hasStepConfigDef
in interface ScopedElement
public boolean equals(java.lang.Object object)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object
Copyright © 2007-2015 Xplenty, Inc. All Rights Reserved.