|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcascading.tap.Tap<JobConf,RecordReader,OutputCollector>
cascading.tap.hadoop.Hfs
public class Hfs
Class Hfs is the base class for all Hadoop file system access. Hfs may only be used with the
HadoopFlowConnector
when creating Hadoop executable Flow
instances.
GlobHfs
and less so by Hfs.
Hfs will accept /*
(wildcard) paths, but not all convenience methods like
getSize(org.apache.hadoop.mapred.JobConf)
will behave properly or reliably. Nor can the Hfs instance
with a wildcard path be used as a sink to write data.
In those cases use GlobHfs since it is a sub-class of MultiSourceTap
.
Optionally use Dfs
or Lfs
for resources specific to Hadoop Distributed file system or
the Local file system, respectively. Using Hfs is the best practice when possible, Lfs and Dfs are conveniences.
Use the Hfs class if the 'kind' of resource is unknown at design time. To use, prefix a scheme to the 'stringPath'. Where
hdfs://...
will denote Dfs, and file://...
will denote Lfs.
Call setTemporaryDirectory(java.util.Map, String)
to use a different temporary file directory path
other than the current Hadoop default path.
By default Cascading on Hadoop will assume any source or sink Tap using the file://
URI scheme
intends to read files from the local client filesystem (for example when using the Lfs
Tap) where the Hadoop
job jar is started, Tap so will force any MapReduce jobs reading or writing to file://
resources to run in
Hadoop "standalone mode" so that the file can be read.
To change this behavior, HfsProps.setLocalModeScheme(java.util.Map, String)
to set a different scheme value,
or to "none" to disable entirely for the case the file to be read is available on every Hadoop processing node
in the exact same path.
Hfs can optionally combine multiple small files (or a series of small "blocks") into larger "splits". This reduces
the number of resulting map tasks created by Hadoop and can improve application performance.
This is enabled by calling HfsProps.setUseCombinedInput(boolean)
to true
. By default, merging
or combining splits into large ones is disabled.
Field Summary | |
---|---|
protected String |
stringPath
Field stringPath |
static String |
TEMPORARY_DIRECTORY
Deprecated. see HfsProps.TEMPORARY_DIRECTORY |
Constructor Summary | |
---|---|
protected |
Hfs()
|
|
Hfs(Fields fields,
String stringPath)
Deprecated. |
|
Hfs(Fields fields,
String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Fields fields,
String stringPath,
SinkMode sinkMode)
Deprecated. |
protected |
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme)
|
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
String stringPath)
Constructor Hfs creates a new Hfs instance. |
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
String stringPath,
SinkMode sinkMode)
Constructor Hfs creates a new Hfs instance. |
Method Summary | |
---|---|
protected void |
applySourceConfInitIdentifiers(FlowProcess<JobConf> process,
JobConf conf,
String... fullIdentifiers)
|
boolean |
createResource(JobConf conf)
|
boolean |
deleteChildResource(JobConf conf,
String childIdentifier)
|
boolean |
deleteResource(JobConf conf)
|
long |
getBlockSize(JobConf conf)
Method getBlockSize returns the blocksize specified by the underlying file system for this resource. |
String[] |
getChildIdentifiers(JobConf conf)
|
String[] |
getChildIdentifiers(JobConf conf,
int depth,
boolean fullyQualified)
|
protected static boolean |
getCombinedInputSafeMode(JobConf conf)
|
protected FileSystem |
getDefaultFileSystem(JobConf jobConf)
|
URI |
getDefaultFileSystemURIScheme(JobConf jobConf)
Method getDefaultFileSystemURIScheme returns the URI scheme for the default Hadoop FileSystem. |
protected FileSystem |
getFileSystem(JobConf jobConf)
|
String |
getFullIdentifier(JobConf conf)
|
String |
getIdentifier()
|
protected static String |
getLocalModeScheme(JobConf conf,
String defaultValue)
|
long |
getModifiedTime(JobConf conf)
|
Path |
getPath()
|
int |
getReplication(JobConf conf)
Method getReplication returns the replication specified by the underlying file system for
this resource. |
long |
getSize(JobConf conf)
|
static String |
getTemporaryDirectory(Map<Object,Object> properties)
Deprecated. see HfsProps |
static Path |
getTempPath(JobConf conf)
|
URI |
getURIScheme(JobConf jobConf)
|
protected static boolean |
getUseCombinedInput(JobConf conf)
|
boolean |
isDirectory(JobConf conf)
|
protected String |
makeTemporaryPathDirString(String name)
|
protected URI |
makeURIScheme(JobConf jobConf)
|
TupleEntryIterator |
openForRead(FlowProcess<JobConf> flowProcess,
RecordReader input)
|
TupleEntryCollector |
openForWrite(FlowProcess<JobConf> flowProcess,
OutputCollector output)
|
boolean |
resourceExists(JobConf conf)
|
protected void |
setStringPath(String stringPath)
|
static void |
setTemporaryDirectory(Map<Object,Object> properties,
String tempDir)
Deprecated. see HfsProps |
protected void |
setUriScheme(URI uriScheme)
|
void |
sinkConfInit(FlowProcess<JobConf> process,
JobConf conf)
|
void |
sourceConfInit(FlowProcess<JobConf> process,
JobConf conf)
|
protected void |
sourceConfInitAddInputPath(JobConf conf,
Path qualifiedPath)
|
protected void |
sourceConfInitComplete(FlowProcess<JobConf> process,
JobConf conf)
|
protected static void |
verifyNoDuplicates(JobConf conf)
|
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
@Deprecated public static final String TEMPORARY_DIRECTORY
HfsProps.TEMPORARY_DIRECTORY
protected String stringPath
Constructor Detail |
---|
protected Hfs()
@ConstructorProperties(value="scheme") protected Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme)
@Deprecated @ConstructorProperties(value={"fields","stringPath"}) public Hfs(Fields fields, String stringPath)
fields
- of type FieldsstringPath
- of type String@Deprecated @ConstructorProperties(value={"fields","stringPath","replace"}) public Hfs(Fields fields, String stringPath, boolean replace)
fields
- of type FieldsstringPath
- of type Stringreplace
- of type boolean@Deprecated @ConstructorProperties(value={"fields","stringPath","sinkMode"}) public Hfs(Fields fields, String stringPath, SinkMode sinkMode)
fields
- of type FieldsstringPath
- of type StringsinkMode
- of type SinkMode@ConstructorProperties(value={"scheme","stringPath"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, String stringPath)
scheme
- of type SchemestringPath
- of type String@Deprecated @ConstructorProperties(value={"scheme","stringPath","replace"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, String stringPath, boolean replace)
scheme
- of type SchemestringPath
- of type Stringreplace
- of type boolean@ConstructorProperties(value={"scheme","stringPath","sinkMode"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, String stringPath, SinkMode sinkMode)
scheme
- of type SchemestringPath
- of type StringsinkMode
- of type SinkModeMethod Detail |
---|
@Deprecated public static void setTemporaryDirectory(Map<Object,Object> properties, String tempDir)
HfsProps
properties
- of type Map@Deprecated public static String getTemporaryDirectory(Map<Object,Object> properties)
HfsProps
properties
- of type Mapprotected static String getLocalModeScheme(JobConf conf, String defaultValue)
protected static boolean getUseCombinedInput(JobConf conf)
protected static boolean getCombinedInputSafeMode(JobConf conf)
protected void setStringPath(String stringPath)
protected void setUriScheme(URI uriScheme)
public URI getURIScheme(JobConf jobConf)
protected URI makeURIScheme(JobConf jobConf)
public URI getDefaultFileSystemURIScheme(JobConf jobConf)
jobConf
- of type JobConf
protected FileSystem getDefaultFileSystem(JobConf jobConf)
protected FileSystem getFileSystem(JobConf jobConf)
public String getIdentifier()
getIdentifier
in class Tap<JobConf,RecordReader,OutputCollector>
public Path getPath()
public String getFullIdentifier(JobConf conf)
getFullIdentifier
in class Tap<JobConf,RecordReader,OutputCollector>
public void sourceConfInit(FlowProcess<JobConf> process, JobConf conf)
sourceConfInit
in class Tap<JobConf,RecordReader,OutputCollector>
protected static void verifyNoDuplicates(JobConf conf)
protected void applySourceConfInitIdentifiers(FlowProcess<JobConf> process, JobConf conf, String... fullIdentifiers)
protected void sourceConfInitAddInputPath(JobConf conf, Path qualifiedPath)
protected void sourceConfInitComplete(FlowProcess<JobConf> process, JobConf conf)
public void sinkConfInit(FlowProcess<JobConf> process, JobConf conf)
sinkConfInit
in class Tap<JobConf,RecordReader,OutputCollector>
public TupleEntryIterator openForRead(FlowProcess<JobConf> flowProcess, RecordReader input) throws IOException
openForRead
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public TupleEntryCollector openForWrite(FlowProcess<JobConf> flowProcess, OutputCollector output) throws IOException
openForWrite
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public boolean createResource(JobConf conf) throws IOException
createResource
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public boolean deleteResource(JobConf conf) throws IOException
deleteResource
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public boolean deleteChildResource(JobConf conf, String childIdentifier) throws IOException
IOException
public boolean resourceExists(JobConf conf) throws IOException
resourceExists
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public boolean isDirectory(JobConf conf) throws IOException
isDirectory
in interface FileType<JobConf>
IOException
public long getSize(JobConf conf) throws IOException
getSize
in interface FileType<JobConf>
IOException
public long getBlockSize(JobConf conf) throws IOException
blocksize
specified by the underlying file system for this resource.
conf
- of JobConf
IOException
- whenpublic int getReplication(JobConf conf) throws IOException
replication
specified by the underlying file system for
this resource.
conf
- of JobConf
IOException
- whenpublic String[] getChildIdentifiers(JobConf conf) throws IOException
getChildIdentifiers
in interface FileType<JobConf>
IOException
public String[] getChildIdentifiers(JobConf conf, int depth, boolean fullyQualified) throws IOException
getChildIdentifiers
in interface FileType<JobConf>
IOException
public long getModifiedTime(JobConf conf) throws IOException
getModifiedTime
in class Tap<JobConf,RecordReader,OutputCollector>
IOException
public static Path getTempPath(JobConf conf)
protected String makeTemporaryPathDirString(String name)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |