T
- the type of the materialized recordspublic class ParquetInputFormat<T> extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
ReadSupport
to materialize the records.
The requestedSchema will control how the original records get projected by the loader.
It must be a subset of the original schema. Only the columns needed to reconstruct the records with the requestedSchema will be scanned.Modifier and Type | Field and Description |
---|---|
static String |
READ_SUPPORT_CLASS
key to configure the ReadSupport implementation
|
static String |
UNBOUND_RECORD_FILTER
key to configure the filter
|
Constructor and Description |
---|
ParquetInputFormat()
Hadoop will instantiate using this constructor
|
ParquetInputFormat(Class<S> readSupportClass)
constructor used when this InputFormat in wrapped in another one (In Pig for example)
|
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapreduce.RecordReader<Void,T> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) |
List<Footer> |
getFooters(org.apache.hadoop.conf.Configuration configuration,
List<org.apache.hadoop.fs.FileStatus> statuses)
the footers for the files
|
List<Footer> |
getFooters(org.apache.hadoop.mapreduce.JobContext jobContext) |
GlobalMetaData |
getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext) |
ReadSupport<T> |
getReadSupport(org.apache.hadoop.conf.Configuration configuration) |
static Class<?> |
getReadSupportClass(org.apache.hadoop.conf.Configuration configuration) |
List<ParquetInputSplit> |
getSplits(org.apache.hadoop.conf.Configuration configuration,
List<Footer> footers) |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) |
static Class<?> |
getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration) |
protected List<org.apache.hadoop.fs.FileStatus> |
listStatus(org.apache.hadoop.mapreduce.JobContext jobContext) |
static void |
setReadSupportClass(org.apache.hadoop.mapreduce.Job job,
Class<?> readSupportClass) |
static void |
setReadSupportClass(org.apache.hadoop.mapred.JobConf conf,
Class<?> readSupportClass) |
static void |
setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job,
Class<? extends UnboundRecordFilter> filterClass) |
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public static final String READ_SUPPORT_CLASS
public static final String UNBOUND_RECORD_FILTER
public ParquetInputFormat()
public ParquetInputFormat(Class<S> readSupportClass)
readSupportClass
- the class to materialize recordspublic static void setReadSupportClass(org.apache.hadoop.mapreduce.Job job, Class<?> readSupportClass)
public static void setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job, Class<? extends UnboundRecordFilter> filterClass)
public static Class<?> getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration)
public static void setReadSupportClass(org.apache.hadoop.mapred.JobConf conf, Class<?> readSupportClass)
public static Class<?> getReadSupportClass(org.apache.hadoop.conf.Configuration configuration)
public org.apache.hadoop.mapreduce.RecordReader<Void,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<Void,T>
IOException
InterruptedException
public ReadSupport<T> getReadSupport(org.apache.hadoop.conf.Configuration configuration)
configuration
- to find the configuration for the read supportpublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
getSplits
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
IOException
public List<ParquetInputSplit> getSplits(org.apache.hadoop.conf.Configuration configuration, List<Footer> footers) throws IOException
configuration
- the configuration to connect to the file systemfooters
- the footers of the files to readIOException
protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
listStatus
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
IOException
public List<Footer> getFooters(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
jobContext
- the current job contextIOException
public List<Footer> getFooters(org.apache.hadoop.conf.Configuration configuration, List<org.apache.hadoop.fs.FileStatus> statuses) throws IOException
configuration
- to connect to the file systemstatuses
- the files to openIOException
public GlobalMetaData getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
jobContext
- the current job contextIOException
Copyright © 2015. All rights reserved.