cascading.pipe.assembly
Class CountBy

java.lang.Object
  extended by cascading.pipe.Pipe
      extended by cascading.pipe.SubAssembly
          extended by cascading.pipe.assembly.AggregateBy
              extended by cascading.pipe.assembly.CountBy
All Implemented Interfaces:
FlowElement, Traceable, Serializable

public class CountBy
extends AggregateBy

Class CountBy is used to count duplicates in a tuple stream, where "duplicates" means all tuples with the same values for the groupingFields fields. The resulting count is output as a long value in the specified countField.

Typically finding the count of a field in a tuple stream relies on a GroupBy and a Count Aggregator operation.

The CountBy SubAssembly is a (typically) more efficient replacement for these two steps, because it does map-side pre-reduce counting (via CountBy.CountPartials AggregateBy.Functor) before the GroupBy operator; this reduces network I/O from the map to reduce phases.

This strategy is similar to using combiners, except no sorting or serialization is invoked and results in a much simpler mechanism.

The threshold value tells the underlying CountPartials functions how many unique key counts to accumulate in the LRU cache, before emitting the least recently used entry. This accumulation happens map-side, and thus is bounded by the size of your map task JVM and the typical size of each group key.

By default, either the value of AggregateByProps.AGGREGATE_BY_CAPACITY System property or AggregateByProps.AGGREGATE_BY_DEFAULT_CAPACITY will be used.

If include is CountBy.Include.NO_NULLS, argument tuples with all null values will be ignored.

The values in the argument Tuple are normally all the remaining fields not used for grouping, but this can be narrowed using the valueFields parameter. When counting the occurrence of a single field (when valueFields is set on the constructor), this is the same behavior as select count(foo) ... in SQL. If include is CountBy.Include.ONLY_NULLS then only argument tuples with all null values will be counted.

See Also:
AggregateBy, Serialized Form

Nested Class Summary
static class CountBy.CountPartials
          Class CountPartials is a AggregateBy.Functor that is used to count observed duplicates from the tuple stream.
static class CountBy.Include
           
 
Nested classes/interfaces inherited from class cascading.pipe.assembly.AggregateBy
AggregateBy.Cache, AggregateBy.CompositeFunction, AggregateBy.Flush, AggregateBy.Functor
 
Field Summary
static int DEFAULT_THRESHOLD
          Deprecated. 
 
Fields inherited from class cascading.pipe.assembly.AggregateBy
AGGREGATE_BY_THRESHOLD, USE_DEFAULT_THRESHOLD
 
Fields inherited from class cascading.pipe.Pipe
configDef, parent, stepConfigDef
 
Constructor Summary
CountBy(Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Fields valueFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Fields valueFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe[] pipes, Fields groupingFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(Pipe pipe, Fields groupingFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields valueFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe[] pipes, Fields groupingFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, CountBy.Include include, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields valueFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
CountBy(String name, Pipe pipe, Fields groupingFields, Fields countField, int threshold)
          Constructor CountBy creates a new CountBy instance.
 
Method Summary
 
Methods inherited from class cascading.pipe.assembly.AggregateBy
getAggregators, getArgumentFields, getCapacity, getFieldDeclarations, getFunctors, getGroupBy, getGroupingFields, getThreshold, initialize, initialize, verify
 
Methods inherited from class cascading.pipe.SubAssembly
getName, getPrevious, getTailNames, getTails, setPrevious, setTails, unwind
 
Methods inherited from class cascading.pipe.Pipe
equals, getConfigDef, getHeads, getParent, getStepConfigDef, getTrace, hasConfigDef, hashCode, hasStepConfigDef, id, isEquivalentTo, named, names, outgoingScopeFor, pipes, print, printInternal, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, setParent, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_THRESHOLD

@Deprecated
public static final int DEFAULT_THRESHOLD
Deprecated. 
DEFAULT_THRESHOLD

See Also:
Constant Field Values
Constructor Detail

CountBy

@ConstructorProperties(value="countField")
public CountBy(Fields countField)
Constructor CountBy creates a new CountBy instance. Use this constructor when used with a AggregateBy instance.

Parameters:
countField - of type Fields

CountBy

@ConstructorProperties(value={"countField","include"})
public CountBy(Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance. Use this constructor when used with a AggregateBy instance.

Parameters:
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"valueFields","countField"})
public CountBy(Fields valueFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance. Use this constructor when used with a AggregateBy instance.

Parameters:
countField - of type Fields

CountBy

@ConstructorProperties(value={"valueFields","countField","include"})
public CountBy(Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance. Use this constructor when used with a AggregateBy instance.

Parameters:
countField - of type Fields

CountBy

@ConstructorProperties(value={"pipe","groupingFields","countField"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"pipe","groupingFields","countField","threshold"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
countField - fo type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","countField"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","countField","threshold"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"pipes","groupingFields","countField"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"pipes","groupingFields","countField","threshold"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","countField"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","countField","threshold"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"pipe","groupingFields","countField","include"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"pipe","groupingFields","countField","include","threshold"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
countField - fo type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","countField","include"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","countField","include","threshold"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"pipes","groupingFields","countField","include"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"pipes","groupingFields","countField","include","threshold"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","countField","include"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","countField","include","threshold"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"pipe","groupingFields","valueFields","countField"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"pipe","groupingFields","valueFields","countField","threshold"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - fo type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","valueFields","countField"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","valueFields","countField","threshold"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"pipes","groupingFields","valueFields","countField"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"pipes","groupingFields","valueFields","countField","threshold"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","valueFields","countField"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","valueFields","countField","threshold"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
threshold - of type int

CountBy

@ConstructorProperties(value={"pipe","groupingFields","valueFields","countField","include"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"pipe","groupingFields","valueFields","countField","include","threshold"})
public CountBy(Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - fo type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","valueFields","countField","include"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
valueFields - of type Fields
groupingFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"name","pipe","groupingFields","valueFields","countField","include","threshold"})
public CountBy(String name,
                                          Pipe pipe,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipe - of type Pipe
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"pipes","groupingFields","valueFields","countField","include"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"pipes","groupingFields","valueFields","countField","include","threshold"})
public CountBy(Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","valueFields","countField","include"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include

CountBy

@ConstructorProperties(value={"name","pipes","groupingFields","valueFields","countField","include","threshold"})
public CountBy(String name,
                                          Pipe[] pipes,
                                          Fields groupingFields,
                                          Fields valueFields,
                                          Fields countField,
                                          CountBy.Include include,
                                          int threshold)
Constructor CountBy creates a new CountBy instance.

Parameters:
name - of type String
pipes - of type Pipe[]
groupingFields - of type Fields
valueFields - of type Fields
countField - of type Fields
include - of type Include
threshold - of type int


Copyright © 2007-2015 Concurrent, Inc. All Rights Reserved.