cascading.scheme.hadoop
Class TextDelimited

java.lang.Object
  extended by cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
      extended by cascading.scheme.hadoop.TextLine
          extended by cascading.scheme.hadoop.TextDelimited
All Implemented Interfaces:
Serializable

public class TextDelimited
extends TextLine

Class TextDelimited is a sub-class of TextLine. It provides direct support for delimited text files, like TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.

TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped.

It is assumed if sink/source fields is set to either Fields.ALL or Fields.UNKNOWN and skipHeader or hasHeader is true, the field names will be retrieved from the header of the file and used during planning. The header will parsed with the same rules as the body of the file.

By default headers are not skipped.

TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly from the declared fields. Or if the declared fields are Fields.ALL or Fields.UNKNOWN, the resolved field names will be used, if any.

By default headers are not written.

If hasHeaders is set to true on a constructor, both skipHeader and writeHeader will be set to true.

By default this Scheme is both strict and safe.

Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a TapException. If strict is false, then Tuple will be returned with null values for the missing fields.

Safe meaning if a field cannot be coerced into an expected type, a null will be used for the value. If safe is false, a TapException will be thrown.

Also by default, quote strings are not searched for to improve processing speed. If a file is COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically double quotes (").

Note all empty fields in a line will be returned as null unless coerced into a new type.

This Scheme may source/sink Fields.ALL, when given on the constructor the new instance will automatically default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given either, so all values will be returned as Strings.

By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName constructor argument.

To override field and line parsing behaviors, sub-class DelimitedParser or provide a FieldTypeResolver implementation.

Note that there should be no expectation that TextDelimited, or specifically DelimitedParser, can handle all delimited and quoted combinations reliably. Attempting to do so would impair its performance and maintainability.

Further, it can be safely said any corrupted files will not be supported for obvious reasons. Corrupted files may result in exceptions or could cause edge cases in the underlying java regular expression engine.

A large part of Cascading was designed to help users cleans data. Thus the recommendation is to create Flows that are responsible for cleansing large data-sets when faced with the problem

DelimitedParser maybe sub-classed and extended if necessary.

See Also:
TextLine, Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class cascading.scheme.hadoop.TextLine
TextLine.Compress
 
Field Summary
static String DEFAULT_CHARSET
           
protected  DelimitedParser delimitedParser
          Field delimitedParser
 
Fields inherited from class cascading.scheme.hadoop.TextLine
DEFAULT_SOURCE_FIELDS
 
Constructor Summary
TextDelimited()
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(boolean hasHeader, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.
TextDelimited(boolean hasHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.
TextDelimited(Fields fields)
          Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe, String charsetName)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe, String charsetName)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String charsetName, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, Class[] types, boolean safe, String charsetName)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, String quote, String charsetName)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(TextLine.Compress sinkCompression, boolean hasHeader, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.
TextDelimited(TextLine.Compress sinkCompression, boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(TextLine.Compress sinkCompression, DelimitedParser delimitedParser)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.
 
Method Summary
 String getDelimiter()
          Method getDelimiter returns the delimiter used to parse fields from the current line of text.
 String getQuote()
          Method getQuote returns the quote string, if any, used to encapsulate each field in a line to delimited text.
 boolean isSymmetrical()
           
 void presentSinkFields(FlowProcess<JobConf> flowProcess, Tap tap, Fields fields)
           
 void presentSourceFields(FlowProcess<JobConf> flowProcess, Tap tap, Fields fields)
           
 Fields retrieveSourceFields(FlowProcess<JobConf> flowProcess, Tap tap)
           
 void setSinkFields(Fields sinkFields)
           
 void setSourceFields(Fields sourceFields)
           
 void sink(FlowProcess<JobConf> flowProcess, SinkCall<Object[],OutputCollector> sinkCall)
           
 void sinkPrepare(FlowProcess<JobConf> flowProcess, SinkCall<Object[],OutputCollector> sinkCall)
           
 boolean source(FlowProcess<JobConf> flowProcess, SourceCall<Object[],RecordReader> sourceCall)
           
 void sourcePrepare(FlowProcess<JobConf> flowProcess, SourceCall<Object[],RecordReader> sourceCall)
           
protected  void writeHeader(SinkCall<Object[],OutputCollector> sinkCall)
           
 
Methods inherited from class cascading.scheme.hadoop.TextLine
getSinkCompression, makeEncodedString, setCharsetName, setSinkCompression, sinkConfInit, sourceCleanup, sourceConfInit, sourceHandleInput, verify
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, sinkCleanup, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_CHARSET

public static final String DEFAULT_CHARSET
See Also:
Constant Field Values

delimitedParser

protected final DelimitedParser delimitedParser
Field delimitedParser

Constructor Detail

TextDelimited

public TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.


TextDelimited

@ConstructorProperties(value={"hasHeader","delimiter"})
public TextDelimited(boolean hasHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
hasHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"hasHeader","delimiter","quote"})
public TextDelimited(boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
hasHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"hasHeader","delimitedParser"})
public TextDelimited(boolean hasHeader,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
hasHeader - of type boolean
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value="delimitedParser")
public TextDelimited(DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

This constructor will set skipHeader and writeHeader values to true.

Parameters:
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"sinkCompression","hasHeader","delimitedParser"})
public TextDelimited(TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
sinkCompression - of type Compress
hasHeader - of type boolean
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"sinkCompression","delimitedParser"})
public TextDelimited(TextLine.Compress sinkCompression,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using the given delimitedParser instance for parsing.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

This constructor will set skipHeader and writeHeader values to true.

Parameters:
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"sinkCompression","hasHeader","delimiter","quote"})
public TextDelimited(TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value="fields")
public TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.

Parameters:
fields - of type Fields

TextDelimited

@ConstructorProperties(value={"fields","delimiter"})
public TextDelimited(Fields fields,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","delimiter","types"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe","charsetName"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe,
                                                String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean
charsetName - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types","safe","charsetName"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                Class[] types,
                                                boolean safe,
                                                String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
types - of type Class[]
safe - of type boolean
charsetName - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","charsetName"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
quote - of type String
charsetName - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","strict","quote","types","safe"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                boolean strict,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
delimiter - of type String
strict - of type boolean
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","strict","quote","types","safe","charsetName"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                boolean strict,
                                                String quote,
                                                Class[] types,
                                                boolean safe,
                                                String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
delimiter - of type String
strict - of type boolean
quote - of type String
types - of type Class[]
safe - of type boolean
charsetName - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimitedParser"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
writeHeader - of type boolean
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimitedParser"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimitedParser"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
writeHeader - of type boolean
delimitedParser - of type DelimitedParser

TextDelimited

@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","charsetName","delimitedParser"})
public TextDelimited(Fields fields,
                                                TextLine.Compress sinkCompression,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String charsetName,
                                                DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
sinkCompression - of type Compress
skipHeader - of type boolean
writeHeader - of type boolean
charsetName - of type String
delimitedParser - of type DelimitedParser
Method Detail

getDelimiter

public String getDelimiter()
Method getDelimiter returns the delimiter used to parse fields from the current line of text.

Returns:
a String

getQuote

public String getQuote()
Method getQuote returns the quote string, if any, used to encapsulate each field in a line to delimited text.

Returns:
a String

isSymmetrical

public boolean isSymmetrical()
Overrides:
isSymmetrical in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

setSinkFields

public void setSinkFields(Fields sinkFields)
Overrides:
setSinkFields in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

setSourceFields

public void setSourceFields(Fields sourceFields)
Overrides:
setSourceFields in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

retrieveSourceFields

public Fields retrieveSourceFields(FlowProcess<JobConf> flowProcess,
                                   Tap tap)
Overrides:
retrieveSourceFields in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

presentSourceFields

public void presentSourceFields(FlowProcess<JobConf> flowProcess,
                                Tap tap,
                                Fields fields)
Overrides:
presentSourceFields in class TextLine

presentSinkFields

public void presentSinkFields(FlowProcess<JobConf> flowProcess,
                              Tap tap,
                              Fields fields)
Overrides:
presentSinkFields in class TextLine

sourcePrepare

public void sourcePrepare(FlowProcess<JobConf> flowProcess,
                          SourceCall<Object[],RecordReader> sourceCall)
Overrides:
sourcePrepare in class TextLine

source

public boolean source(FlowProcess<JobConf> flowProcess,
                      SourceCall<Object[],RecordReader> sourceCall)
               throws IOException
Overrides:
source in class TextLine
Throws:
IOException

sinkPrepare

public void sinkPrepare(FlowProcess<JobConf> flowProcess,
                        SinkCall<Object[],OutputCollector> sinkCall)
                 throws IOException
Overrides:
sinkPrepare in class TextLine
Throws:
IOException

writeHeader

protected void writeHeader(SinkCall<Object[],OutputCollector> sinkCall)
                    throws IOException
Throws:
IOException

sink

public void sink(FlowProcess<JobConf> flowProcess,
                 SinkCall<Object[],OutputCollector> sinkCall)
          throws IOException
Overrides:
sink in class TextLine
Throws:
IOException


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.