cascading.pipe.assembly
Class Unique.FilterPartialDuplicates

java.lang.Object
  extended by cascading.operation.BaseOperation<CascadingCache<Tuple,Object>>
      extended by cascading.pipe.assembly.Unique.FilterPartialDuplicates
All Implemented Interfaces:
DeclaresResults, Filter<CascadingCache<Tuple,Object>>, Operation<CascadingCache<Tuple,Object>>, Traceable, Serializable
Enclosing class:
Unique

public static class Unique.FilterPartialDuplicates
extends BaseOperation<CascadingCache<Tuple,Object>>
implements Filter<CascadingCache<Tuple,Object>>

Class FilterPartialDuplicates is a Filter that is used to remove observed duplicates from the tuple stream.

Use this class typically in tandem with a First Aggregator in order to improve de-duping performance by removing as many values as possible before the intermediate GroupBy operator.

The capacity value is used to maintain a LRU of a constant size. If more than capacity unique values are seen, the oldest cached values will be removed from the cache.

See Also:
Unique, Serialized Form

Field Summary
 
Fields inherited from class cascading.operation.BaseOperation
fieldDeclaration, numArgs, trace
 
Fields inherited from interface cascading.operation.Operation
ANY
 
Constructor Summary
Unique.FilterPartialDuplicates()
          Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
Unique.FilterPartialDuplicates(int capacity)
          Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
Unique.FilterPartialDuplicates(Unique.Include include, int capacity)
          Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
Unique.FilterPartialDuplicates(Unique.Include include, int capacity, TupleHasher tupleHasher)
          Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
 
Method Summary
 void cleanup(FlowProcess flowProcess, OperationCall<CascadingCache<Tuple,Object>> operationCall)
          Method cleanup does nothing, and may safely be overridden.
 boolean equals(Object object)
           
 int hashCode()
           
 boolean isRemove(FlowProcess flowProcess, FilterCall<CascadingCache<Tuple,Object>> filterCall)
          Method isRemove returns true if input should be removed from the tuple stream.
 void prepare(FlowProcess flowProcess, OperationCall<CascadingCache<Tuple,Object>> operationCall)
          Method prepare does nothing, and may safely be overridden.
 
Methods inherited from class cascading.operation.BaseOperation
flush, getFieldDeclaration, getNumArgs, getTrace, isSafe, printOperationInternal, toString, toStringInternal
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface cascading.operation.Operation
flush, getFieldDeclaration, getNumArgs, isSafe
 

Constructor Detail

Unique.FilterPartialDuplicates

public Unique.FilterPartialDuplicates()
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.


Unique.FilterPartialDuplicates

@ConstructorProperties(value="capacity")
public Unique.FilterPartialDuplicates(int capacity)
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.

Parameters:
capacity - of type int

Unique.FilterPartialDuplicates

@ConstructorProperties(value={"include","capacity"})
public Unique.FilterPartialDuplicates(Unique.Include include,
                                                                 int capacity)
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.

Parameters:
include - of type Include
capacity - of type int

Unique.FilterPartialDuplicates

@ConstructorProperties(value={"include","capacity","tupleHasher"})
public Unique.FilterPartialDuplicates(Unique.Include include,
                                                                 int capacity,
                                                                 TupleHasher tupleHasher)
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.

Parameters:
capacity - of type int
include - of type Include
tupleHasher - of type TupleHasher
Method Detail

prepare

public void prepare(FlowProcess flowProcess,
                    OperationCall<CascadingCache<Tuple,Object>> operationCall)
Description copied from class: BaseOperation
Method prepare does nothing, and may safely be overridden.

Specified by:
prepare in interface Operation<CascadingCache<Tuple,Object>>
Overrides:
prepare in class BaseOperation<CascadingCache<Tuple,Object>>

isRemove

public boolean isRemove(FlowProcess flowProcess,
                        FilterCall<CascadingCache<Tuple,Object>> filterCall)
Description copied from interface: Filter
Method isRemove returns true if input should be removed from the tuple stream.

Specified by:
isRemove in interface Filter<CascadingCache<Tuple,Object>>
Parameters:
flowProcess - of type FlowProcess
filterCall - of type FilterCall
Returns:
boolean

cleanup

public void cleanup(FlowProcess flowProcess,
                    OperationCall<CascadingCache<Tuple,Object>> operationCall)
Description copied from class: BaseOperation
Method cleanup does nothing, and may safely be overridden.

Specified by:
cleanup in interface Operation<CascadingCache<Tuple,Object>>
Overrides:
cleanup in class BaseOperation<CascadingCache<Tuple,Object>>

equals

public boolean equals(Object object)
Overrides:
equals in class BaseOperation<CascadingCache<Tuple,Object>>

hashCode

public int hashCode()
Overrides:
hashCode in class BaseOperation<CascadingCache<Tuple,Object>>


Copyright © 2007-2015 Concurrent, Inc. All Rights Reserved.