cascading.pipe.assembly
Class Unique

java.lang.Object
  extended by cascading.pipe.Pipe
      extended by cascading.pipe.SubAssembly
          extended by cascading.pipe.assembly.Unique
All Implemented Interfaces:
FlowElement, Traceable, Serializable

public class Unique
extends SubAssembly

Class Unique SubAssembly is used to filter all duplicates out of a tuple stream.

Typically finding unique value in a tuple stream relies on a GroupBy and a FirstNBuffer Buffer operation.

If the include value is set to Unique.Include.NO_NULLS, any tuple consisting of only null values will be removed from the stream.

This SubAssembly uses the Unique.FilterPartialDuplicates Filter to remove as many observed duplicates before the GroupBy operator to reduce IO over the network.

This strategy is similar to using combiners, except no sorting or serialization is invoked and results in a much simpler mechanism.

Unique uses a CascadingCache or LRU to do the filtering. To tune the cache, set the capacity value to a high enough value to utilize available memory. Or set a default value via the UniqueProps.UNIQUE_CACHE_CAPACITY property. The current default is 10, 000 unique keys.

The LRU cache is pluggable and defaults to LRUHashMapCache. It can be changed by setting UniqueProps.UNIQUE_CACHE_FACTORY property to the name of a sub-class of BaseCacheFactory.

The capacity value tells the underlying FilterPartialDuplicates how many values to cache for duplicate comparison before dropping values from the LRU cache.

See Also:
LRUHashMapCacheFactory, DirectMappedCacheFactory, LRUHashMapCache, DirectMappedCache, Serialized Form

Nested Class Summary
static class Unique.Cache
           
static class Unique.FilterPartialDuplicates
          Class FilterPartialDuplicates is a Filter that is used to remove observed duplicates from the tuple stream.
static class Unique.Include
           
 
Field Summary
 
Fields inherited from class cascading.pipe.Pipe
configDef, name, parent, stepConfigDef
 
Constructor Summary
Unique(Pipe[] pipes, Fields uniqueFields)
          Constructor Unique creates a new Unique instance.
Unique(Pipe[] pipes, Fields uniqueFields, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(Pipe[] pipes, Fields uniqueFields, Unique.Include include)
          Constructor Unique creates a new Unique instance.
Unique(Pipe[] pipes, Fields uniqueFields, Unique.Include include, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(Pipe pipe, Fields uniqueFields)
          Constructor Unique creates a new Unique instance.
Unique(Pipe pipe, Fields uniqueFields, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(Pipe pipe, Fields uniqueFields, Unique.Include include)
          Constructor Unique creates a new Unique instance.
Unique(Pipe pipe, Fields uniqueFields, Unique.Include include, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe[] pipes, Fields uniqueFields)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe[] pipes, Fields uniqueFields, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe[] pipes, Fields uniqueFields, Unique.Include include)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe[] pipes, Fields uniqueFields, Unique.Include include, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe pipe, Fields uniqueFields)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe pipe, Fields uniqueFields, int capacity)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe pipe, Fields uniqueFields, Unique.Include include)
          Constructor Unique creates a new Unique instance.
Unique(String name, Pipe pipe, Fields uniqueFields, Unique.Include include, int capacity)
          Constructor Unique creates a new Unique instance.
 
Method Summary
 
Methods inherited from class cascading.pipe.SubAssembly
getName, getPrevious, getTailNames, getTails, setPrevious, setTails, unwind
 
Methods inherited from class cascading.pipe.Pipe
equals, getConfigDef, getHeads, getParent, getStepConfigDef, getTrace, hasConfigDef, hashCode, hasStepConfigDef, id, isEquivalentTo, named, names, outgoingScopeFor, pipes, print, printInternal, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, setParent, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Unique

@ConstructorProperties(value={"pipe","uniqueFields"})
public Unique(Pipe pipe,
                                         Fields uniqueFields)
Constructor Unique creates a new Unique instance.

Parameters:
pipe - of type Pipe
uniqueFields - of type Fields

Unique

@ConstructorProperties(value={"pipe","uniqueFields","include"})
public Unique(Pipe pipe,
                                         Fields uniqueFields,
                                         Unique.Include include)
Constructor Unique creates a new Unique instance.

Parameters:
pipe - of type Pipe
uniqueFields - of type Fields
include - of type Include

Unique

@ConstructorProperties(value={"pipe","uniqueFields","capacity"})
public Unique(Pipe pipe,
                                         Fields uniqueFields,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
pipe - of type Pipe
uniqueFields - of type Fields
capacity - of type int

Unique

@ConstructorProperties(value={"pipe","uniqueFields","include","capacity"})
public Unique(Pipe pipe,
                                         Fields uniqueFields,
                                         Unique.Include include,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
pipe - of type Pipe
uniqueFields - of type Fields
include - of type Include
capacity - of type int

Unique

@ConstructorProperties(value={"name","pipe","uniqueFields"})
public Unique(String name,
                                         Pipe pipe,
                                         Fields uniqueFields)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipe - of type Pipe
uniqueFields - of type Fields

Unique

@ConstructorProperties(value={"name","pipe","uniqueFields","include"})
public Unique(String name,
                                         Pipe pipe,
                                         Fields uniqueFields,
                                         Unique.Include include)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipe - of type Pipe
uniqueFields - of type Fields
include - of type Include

Unique

@ConstructorProperties(value={"name","pipe","uniqueFields","capacity"})
public Unique(String name,
                                         Pipe pipe,
                                         Fields uniqueFields,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipe - of type Pipe
uniqueFields - of type Fields
capacity - of type int

Unique

@ConstructorProperties(value={"name","pipe","uniqueFields","include","capacity"})
public Unique(String name,
                                         Pipe pipe,
                                         Fields uniqueFields,
                                         Unique.Include include,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipe - of type Pipe
uniqueFields - of type Fields
include - of type Include
capacity - of type int

Unique

@ConstructorProperties(value={"pipes","uniqueFields"})
public Unique(Pipe[] pipes,
                                         Fields uniqueFields)
Constructor Unique creates a new Unique instance.

Parameters:
pipes - of type Pipe[]
uniqueFields - of type Fields

Unique

@ConstructorProperties(value={"pipes","uniqueFields","include"})
public Unique(Pipe[] pipes,
                                         Fields uniqueFields,
                                         Unique.Include include)
Constructor Unique creates a new Unique instance.

Parameters:
pipes - of type Pipe[]
uniqueFields - of type Fields
include - of type Include

Unique

@ConstructorProperties(value={"pipes","uniqueFields","capacity"})
public Unique(Pipe[] pipes,
                                         Fields uniqueFields,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
pipes - of type Pipe[]
uniqueFields - of type Fields
capacity - of type int

Unique

@ConstructorProperties(value={"pipes","uniqueFields","include","capacity"})
public Unique(Pipe[] pipes,
                                         Fields uniqueFields,
                                         Unique.Include include,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
pipes - of type Pipe[]
uniqueFields - of type Fields
include - of type Include
capacity - of type int

Unique

@ConstructorProperties(value={"name","pipes","uniqueFields"})
public Unique(String name,
                                         Pipe[] pipes,
                                         Fields uniqueFields)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipes - of type Pipe[]
uniqueFields - of type Fields

Unique

@ConstructorProperties(value={"name","pipes","uniqueFields","include"})
public Unique(String name,
                                         Pipe[] pipes,
                                         Fields uniqueFields,
                                         Unique.Include include)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipes - of type Pipe[]
uniqueFields - of type Fields
include - of type Include

Unique

@ConstructorProperties(value={"name","pipes","uniqueFields","capacity"})
public Unique(String name,
                                         Pipe[] pipes,
                                         Fields uniqueFields,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipes - of type Pipe[]
uniqueFields - of type Fields
capacity - of type int

Unique

@ConstructorProperties(value={"name","pipes","uniqueFields","include","capacity"})
public Unique(String name,
                                         Pipe[] pipes,
                                         Fields uniqueFields,
                                         Unique.Include include,
                                         int capacity)
Constructor Unique creates a new Unique instance.

Parameters:
name - of type String
pipes - of type Pipe[]
uniqueFields - of type Fields
capacity - of type int


Copyright © 2007-2015 Concurrent, Inc. All Rights Reserved.