cascading.flow.hadoop.planner
Class HadoopPlanner

java.lang.Object
  extended by cascading.flow.planner.FlowPlanner<HadoopFlow,JobConf>
      extended by cascading.flow.hadoop.planner.HadoopPlanner
Direct Known Subclasses:
Hadoop2MR1Planner

public class HadoopPlanner
extends FlowPlanner<HadoopFlow,JobConf>

Class HadoopPlanner is the core Hadoop MapReduce planner.

Notes:

Custom JobConf properties
A custom JobConf instance can be passed to this planner by calling copyJobConf(java.util.Map, org.apache.hadoop.mapred.JobConf) on a map properties object before constructing a new HadoopFlowConnector.

A better practice would be to set Hadoop properties directly on the map properties object handed to the FlowConnector. All values in the map will be passed to a new default JobConf instance to be used as defaults for all resulting Flow instances.

For example, properties.set("mapred.child.java.opts","-Xmx512m"); would convince Hadoop to spawn all child jvms with a heap of 512MB.


Field Summary
 
Fields inherited from class cascading.flow.planner.FlowPlanner
assertionLevel, checkpointRootPath, debugLevel, properties
 
Constructor Summary
HadoopPlanner()
           
 
Method Summary
 HadoopFlow buildFlow(FlowDef flowDef)
           
static void copyJobConf(Map<Object,Object> properties, JobConf jobConf)
          Method copyJobConf adds the given JobConf values to the given properties object.
static void copyProperties(JobConf jobConf, Map<Object,Object> properties)
          Method copyProperties adds the given Map values to the given JobConf object.
protected  HadoopFlow createFlow(FlowDef flowDef)
           
static JobConf createJobConf(Map<Object,Object> properties)
          Method createJobConf returns a new JobConf instance using the values in the given properties argument.
static boolean getCollapseAdjacentTaps(Map<Object,Object> properties)
           
 JobConf getConfig()
           
static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
          Deprecated. 
 PlatformInfo getPlatformInfo()
           
 void initialize(FlowConnector flowConnector, Map<Object,Object> properties)
           
protected  Tap makeTempTap(String prefix, String name)
           
static void setCollapseAdjacentTaps(Map<Object,Object> properties, boolean collapseAdjacent)
          Method setCollapseAdjacentTaps enables/disables an optimization that will identify if a sink tap and an intermediate tap are equivalent field wise, and discard the intermediate tap for the sink tap to minimize the number of MR jobs.
static void setNormalizeHeterogeneousSources(Map<Object,Object> properties, boolean doNormalize)
          Deprecated. 
 
Methods inherited from class cascading.flow.planner.FlowPlanner
createElementGraph, failOnGroupEverySplit, failOnLoneGroupAssertion, failOnMissingGroup, failOnMisusedBuffer, getProperties, handleExceptionDuringPlanning, handleJobPartitioning, handleJoins, handleNonSafeOperations, insertTempTapAfter, makeTempTap, resolveAssemblyPlanners, resolveTails, verifyAllTaps, verifyAssembly, verifyCheckpoints, verifyPipeAssemblyEndPoints, verifySourceNotSinks, verifyTaps, verifyTraps
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HadoopPlanner

public HadoopPlanner()
Method Detail

copyJobConf

public static void copyJobConf(Map<Object,Object> properties,
                               JobConf jobConf)
Method copyJobConf adds the given JobConf values to the given properties object. Use this method to pass custom default Hadoop JobConf properties to Hadoop.

Parameters:
properties - of type Map
jobConf - of type JobConf

createJobConf

public static JobConf createJobConf(Map<Object,Object> properties)
Method createJobConf returns a new JobConf instance using the values in the given properties argument.

Parameters:
properties - of type Map
Returns:
a JobConf instance

copyProperties

public static void copyProperties(JobConf jobConf,
                                  Map<Object,Object> properties)
Method copyProperties adds the given Map values to the given JobConf object.

Parameters:
jobConf - of type JobConf
properties - of type Map

setNormalizeHeterogeneousSources

@Deprecated
public static void setNormalizeHeterogeneousSources(Map<Object,Object> properties,
                                                               boolean doNormalize)
Deprecated. 

Method setNormalizeHeterogeneousSources adds the given doNormalize boolean to the given properties object. Use this method if additional jobs should be planned in to handle incompatible InputFormat classes.

Normalization is off by default and should only be enabled by advanced users. Typically this will decrease application performance.

Parameters:
properties - of type Map
doNormalize - of type boolean

getNormalizeHeterogeneousSources

@Deprecated
public static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
Deprecated. 

Method getNormalizeHeterogeneousSources returns if this planner will normalize heterogeneous input sources.

Parameters:
properties - of type Map
Returns:
a boolean

setCollapseAdjacentTaps

public static void setCollapseAdjacentTaps(Map<Object,Object> properties,
                                           boolean collapseAdjacent)
Method setCollapseAdjacentTaps enables/disables an optimization that will identify if a sink tap and an intermediate tap are equivalent field wise, and discard the intermediate tap for the sink tap to minimize the number of MR jobs.

Note that some Scheme types may lose type information if the planner cannot detect field types. This could result in type mismatch errors during joins.

Parameters:
properties -
collapseAdjacent -

getCollapseAdjacentTaps

public static boolean getCollapseAdjacentTaps(Map<Object,Object> properties)

getConfig

public JobConf getConfig()
Specified by:
getConfig in class FlowPlanner<HadoopFlow,JobConf>

getPlatformInfo

public PlatformInfo getPlatformInfo()
Specified by:
getPlatformInfo in class FlowPlanner<HadoopFlow,JobConf>

initialize

public void initialize(FlowConnector flowConnector,
                       Map<Object,Object> properties)
Overrides:
initialize in class FlowPlanner<HadoopFlow,JobConf>

createFlow

protected HadoopFlow createFlow(FlowDef flowDef)
Specified by:
createFlow in class FlowPlanner<HadoopFlow,JobConf>

buildFlow

public HadoopFlow buildFlow(FlowDef flowDef)
Specified by:
buildFlow in class FlowPlanner<HadoopFlow,JobConf>

makeTempTap

protected Tap makeTempTap(String prefix,
                          String name)
Specified by:
makeTempTap in class FlowPlanner<HadoopFlow,JobConf>


Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.