cascading.pipe
Class GroupBy

java.lang.Object
  extended by cascading.pipe.Pipe
      extended by cascading.pipe.Splice
          extended by cascading.pipe.GroupBy
All Implemented Interfaces:
FlowElement, Group, Serializable

public class GroupBy
extends Splice
implements Group

The GroupBy pipe groups the Tuple stream by the given groupFields.

If more than one Pipe instance is provided on the constructor, all branches will be merged. It is required that all Pipe instances output the same field names, otherwise the FlowConnector will fail to create a Flow instance. Again, the Pipe instances are merged together as if one Tuple stream and not joined. See CoGroup for joining by common fields.

Typically an Every follows GroupBy to apply an Aggregator function to every grouping. The Each operator may also follow GroupBy to apply a Function or Filter to the resulting stream. But an Each cannot come immediately before an Every.

Optionally a stream can be further sorted by providing sortFields. This allows an Aggregator to receive values in the order of the sortedFields.

Note that local sorting always happens on the groupFields, sortFields are a secondary sorting on the grouped values within the current grouping. sortFields is particularly useful if the Aggregators following the GroupBy would like to see their arguments in order.

For more control over sorting at the group or secondary sort level, use Fields containing Comparator instances for the appropriate fields when setting the groupFields or sortFields values. Fields allows you to set a custom Comparator instance for each field name or position. It is required that each Comparator class also be Serializable.

It should be noted for MapReduce systems, distributed group sorting is not 'total'. That is groups are sorted as seen by each Reducer, but they are not sorted across Reducers. See the MapReduce algorithm for details.

See the Hasher interface when a custom Comparator on the grouping keys is being provided that makes two values with differing hashCode values equal. For example, new BigDecimal( 100.0D ) and new Double 100.0D ) are equal using a custom Comparator, but Object.hashCode() will be different, thus forcing each value into differing partitions.

Note that grouping one String key with a lowercase value with another String key with an uppercase value using a "case insensitive" Comparator will not have consistent results. The grouping will execute and be correct, but the actual values in the key columns may be replaced with "equivalent" values from other streams.

That is, if two streams are merged and then grouped on a key, where one stream the key values are uppercase and the other stream values are lowercase, the resulting key value for the grouping may arbitrarily be either upper or lower case.

If the original key values must be retained, consider normalizing the keys with a Function and then grouping on the resulting field.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class cascading.pipe.Splice
declaredFields, keyFieldsMap, resultGroupFields, sortFieldsMap
 
Fields inherited from class cascading.pipe.Pipe
configDef, name, parent, previous, stepConfigDef
 
Constructor Summary
GroupBy(Pipe pipe)
          Creates a new GroupBy instance that will group on Fields.ALL fields.
GroupBy(Pipe[] pipes)
          Creates a new GroupBy instance that will first merge the given pipes, then group on Fields.FIRST.
GroupBy(Pipe[] pipes, Fields groupFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.
GroupBy(Pipe[] pipes, Fields groupFields, Fields sortFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(Pipe[] pipes, Fields groupFields, Fields sortFields, boolean reverseOrder)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(Pipe pipe, Fields groupFields)
          Creates a new GroupBy instance that will group on the given groupFields field names.
GroupBy(Pipe pipe, Fields groupFields, boolean reverseOrder)
          Creates a new GroupBy instance that will group on the given groupFields field names.
GroupBy(Pipe pipe, Fields groupFields, Fields sortFields)
          Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(Pipe pipe, Fields groupFields, Fields sortFields, boolean reverseOrder)
          Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(Pipe lhsPipe, Pipe rhsPipe, Fields groupFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.
GroupBy(String groupName, Pipe[] pipes, Fields groupFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.
GroupBy(String groupName, Pipe[] pipes, Fields groupFields, Fields sortFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(String groupName, Pipe[] pipes, Fields groupFields, Fields sortFields, boolean reverseOrder)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(String groupName, Pipe pipe, Fields groupFields)
          Creates a new GroupBy instance that will group on the given groupFields field names.
GroupBy(String groupName, Pipe pipe, Fields groupFields, boolean reverseOrder)
          Creates a new GroupBy instance that will group on the given groupFields field names.
GroupBy(String groupName, Pipe pipe, Fields groupFields, Fields sortFields)
          Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(String groupName, Pipe pipe, Fields groupFields, Fields sortFields, boolean reverseOrder)
          Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.
GroupBy(String groupName, Pipe lhsPipe, Pipe rhsPipe, Fields groupFields)
          Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.
 
Method Summary
 
Methods inherited from class cascading.pipe.Splice
equals, getDeclaredFields, getJoinDeclaredFields, getJoiner, getKeySelectors, getName, getNumSelfJoins, getPipePos, getPrevious, getSortingSelectors, hashCode, isCoGroup, isEquivalentTo, isGroupBy, isJoin, isMerge, isSorted, isSortReversed, outgoingScopeFor, printInternal, resolveIncomingOperationPassThroughFields, toString
 
Methods inherited from class cascading.pipe.Pipe
getConfigDef, getHeads, getParent, getStepConfigDef, getTrace, hasConfigDef, hasStepConfigDef, id, named, names, pipes, print, resolveIncomingOperationArgumentFields, setParent
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface cascading.pipe.Group
getKeySelectors, getName, getSortingSelectors, isCoGroup, isGroupBy, isSorted, isSortReversed
 
Methods inherited from interface cascading.flow.FlowElement
getConfigDef, getStepConfigDef, hasConfigDef, hasStepConfigDef, isEquivalentTo, outgoingScopeFor, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields
 

Constructor Detail

GroupBy

@ConstructorProperties(value="pipe")
public GroupBy(Pipe pipe)
Creates a new GroupBy instance that will group on Fields.ALL fields.

Parameters:
pipe - of type Pipe

GroupBy

@ConstructorProperties(value={"pipe","groupFields"})
public GroupBy(Pipe pipe,
                                          Fields groupFields)
Creates a new GroupBy instance that will group on the given groupFields field names.

Parameters:
pipe - of type Pipe
groupFields - of type Fields

GroupBy

@ConstructorProperties(value={"pipe","groupFields","reverseOrder"})
public GroupBy(Pipe pipe,
                                          Fields groupFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will group on the given groupFields field names.

Parameters:
pipe - of type Pipe
groupFields - of type Fields
reverseOrder - of type boolean

GroupBy

@ConstructorProperties(value={"groupName","pipe","groupFields"})
public GroupBy(String groupName,
                                          Pipe pipe,
                                          Fields groupFields)
Creates a new GroupBy instance that will group on the given groupFields field names.

Parameters:
groupName - of type String
pipe - of type Pipe
groupFields - of type Fields

GroupBy

@ConstructorProperties(value={"groupName","pipe","groupFields","reverseOrder"})
public GroupBy(String groupName,
                                          Pipe pipe,
                                          Fields groupFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will group on the given groupFields field names.

Parameters:
groupName - of type String
pipe - of type Pipe
groupFields - of type Fields
reverseOrder - of type boolean

GroupBy

@ConstructorProperties(value={"pipe","groupFields","sortFields"})
public GroupBy(Pipe pipe,
                                          Fields groupFields,
                                          Fields sortFields)
Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
pipe - of type Pipe
groupFields - of type Fields
sortFields - of type Fields

GroupBy

@ConstructorProperties(value={"groupName","pipe","groupFields","sortFields"})
public GroupBy(String groupName,
                                          Pipe pipe,
                                          Fields groupFields,
                                          Fields sortFields)
Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
groupName - of type String
pipe - of type Pipe
groupFields - of type Fields
sortFields - of type Fields

GroupBy

@ConstructorProperties(value={"pipe","groupFields","sortFields","reverseOrder"})
public GroupBy(Pipe pipe,
                                          Fields groupFields,
                                          Fields sortFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
pipe - of type Pipe
groupFields - of type Fields
sortFields - of type Fields
reverseOrder - of type boolean

GroupBy

@ConstructorProperties(value={"groupName","pipe","groupFields","sortFields","reverseOrder"})
public GroupBy(String groupName,
                                          Pipe pipe,
                                          Fields groupFields,
                                          Fields sortFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
groupName - of type String
pipe - of type Pipe
groupFields - of type Fields
sortFields - of type Fields
reverseOrder - of type boolean

GroupBy

@ConstructorProperties(value="pipes")
public GroupBy(Pipe[] pipes)
Creates a new GroupBy instance that will first merge the given pipes, then group on Fields.FIRST.

The assumption is that the first fields in all streams are logically the same field, which should be true as merging assumes all incoming streams have the same fields in the same order.

To get the best performance, choose a field(s) that has many unique values, by using the constructor that takes a groupFields argument. If the first field has few unique values, data will only be sent to that number of reducers, or less, in the cluster, making the reduce phase a larger bottleneck.

Parameters:
pipes - of type Pipe

GroupBy

@ConstructorProperties(value={"pipes","groupFields"})
public GroupBy(Pipe[] pipes,
                                          Fields groupFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.

Parameters:
pipes - of type Pipe
groupFields - of type Fields

GroupBy

public GroupBy(Pipe lhsPipe,
               Pipe rhsPipe,
               Fields groupFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.

Parameters:
lhsPipe - of type Pipe
rhsPipe - of type Pipe
groupFields - of type Fields

GroupBy

@ConstructorProperties(value={"groupName","pipes","groupFields"})
public GroupBy(String groupName,
                                          Pipe[] pipes,
                                          Fields groupFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.

Parameters:
groupName - of type String
pipes - of type Pipe
groupFields - of type Fields

GroupBy

public GroupBy(String groupName,
               Pipe lhsPipe,
               Pipe rhsPipe,
               Fields groupFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names.

Parameters:
groupName - of type String
lhsPipe - of type Pipe
rhsPipe - of type Pipe
groupFields - of type Fields

GroupBy

@ConstructorProperties(value={"pipes","groupFields","sortFields"})
public GroupBy(Pipe[] pipes,
                                          Fields groupFields,
                                          Fields sortFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
pipes - of type Pipe
groupFields - of type Fields
sortFields - of type Fields

GroupBy

@ConstructorProperties(value={"groupName","pipes","groupFields","sortFields"})
public GroupBy(String groupName,
                                          Pipe[] pipes,
                                          Fields groupFields,
                                          Fields sortFields)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
groupName - of type String
pipes - of type Pipe
groupFields - of type Fields
sortFields - of type Fields

GroupBy

@ConstructorProperties(value={"pipes","groupFields","sortFields","reverseOrder"})
public GroupBy(Pipe[] pipes,
                                          Fields groupFields,
                                          Fields sortFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
pipes - of type Pipe
groupFields - of type Fields
sortFields - of type Fields
reverseOrder - of type boolean

GroupBy

@ConstructorProperties(value={"groupName","pipes","groupFields","sortFields","reverseOrder"})
public GroupBy(String groupName,
                                          Pipe[] pipes,
                                          Fields groupFields,
                                          Fields sortFields,
                                          boolean reverseOrder)
Creates a new GroupBy instance that will first merge the given pipes, then group on the given groupFields field names and sorts the grouped values on the given sortFields fields names.

Parameters:
groupName - of type String
pipes - of type Pipe
groupFields - of type Fields
sortFields - of type Fields
reverseOrder - of type boolean


Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.