Class SortedSSTableWriter
- java.lang.Object
-
- org.apache.cassandra.spark.bulkwriter.SortedSSTableWriter
-
public class SortedSSTableWriter extends java.lang.ObjectSSTableWriter that expects sorted data
Note for implementor: the bulk writer always sort the data in entire spark partition before writing. One of the benefit is that the output sstables are sorted and non-overlapping. It allows Cassandra to perform optimization when importing those sstables, as they can be considered as a single large SSTable technically. You might want to introduce a SSTableWriter for unsorted data, say UnsortedSSTableWriter, and stop sorting the entire partition, i.e. repartitionAndSortWithinPartitions. By doing so, it eliminates the nice property of the output sstable being globally sorted and non-overlapping. Unless you can think of a better use case, we should stick with this SortedSSTableWriter
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringCASSANDRA_VERSION_PREFIX
-
Constructor Summary
Constructors Constructor Description SortedSSTableWriter(org.apache.cassandra.bridge.SSTableWriter tableWriter, java.nio.file.Path outDir, DigestAlgorithm digestAlgorithm, int partitionId)SortedSSTableWriter(BulkWriterContext writerContext, java.nio.file.Path outDir, DigestAlgorithm digestAlgorithm, int partitionId)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddRow(java.math.BigInteger token, java.util.Map<java.lang.String,java.lang.Object> boundValues)Add a row to be written.longbytesWritten()voidclose(BulkWriterContext writerContext)java.util.Map<java.nio.file.Path,Digest>fileDigestMap()java.nio.file.PathgetOutDir()java.lang.StringgetPackageVersion(java.lang.String lowestCassandraVersion)com.google.common.collect.Range<java.math.BigInteger>getTokenRange()java.util.Map<java.nio.file.Path,Digest>prepareSStablesToSend(BulkWriterContext writerContext, java.util.Set<org.apache.cassandra.bridge.SSTableDescriptor> sstables)longrowCount()voidsetSSTablesProducedListener(java.util.function.Consumer<java.util.Set<org.apache.cassandra.bridge.SSTableDescriptor>> listener)intsstableCount()voidvalidateSSTables(BulkWriterContext writerContext)voidvalidateSSTables(BulkWriterContext writerContext, java.nio.file.Path outputDirectory, java.util.Set<java.nio.file.Path> dataFilePaths)Validate SSTables.
-
-
-
Field Detail
-
CASSANDRA_VERSION_PREFIX
public static final java.lang.String CASSANDRA_VERSION_PREFIX
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SortedSSTableWriter
public SortedSSTableWriter(org.apache.cassandra.bridge.SSTableWriter tableWriter, java.nio.file.Path outDir, DigestAlgorithm digestAlgorithm, int partitionId)
-
SortedSSTableWriter
public SortedSSTableWriter(BulkWriterContext writerContext, java.nio.file.Path outDir, DigestAlgorithm digestAlgorithm, int partitionId)
-
-
Method Detail
-
getPackageVersion
@NotNull public java.lang.String getPackageVersion(java.lang.String lowestCassandraVersion)
-
addRow
public void addRow(java.math.BigInteger token, java.util.Map<java.lang.String,java.lang.Object> boundValues) throws java.io.IOExceptionAdd a row to be written.- Parameters:
token- the hashed token of the row's partition key. The value must be monotonically increasing in the subsequent calls.boundValues- bound values of the columns in the row- Throws:
java.io.IOException- I/O exception when adding the row
-
setSSTablesProducedListener
public void setSSTablesProducedListener(java.util.function.Consumer<java.util.Set<org.apache.cassandra.bridge.SSTableDescriptor>> listener)
-
rowCount
public long rowCount()
- Returns:
- the total number of rows written
-
bytesWritten
public long bytesWritten()
- Returns:
- the total number of bytes written
-
sstableCount
public int sstableCount()
- Returns:
- the total number of sstables written
-
prepareSStablesToSend
public java.util.Map<java.nio.file.Path,Digest> prepareSStablesToSend(@NotNull BulkWriterContext writerContext, java.util.Set<org.apache.cassandra.bridge.SSTableDescriptor> sstables) throws java.io.IOException
- Throws:
java.io.IOException
-
close
public void close(BulkWriterContext writerContext) throws java.io.IOException
- Throws:
java.io.IOException
-
validateSSTables
public void validateSSTables(@NotNull BulkWriterContext writerContext)
-
validateSSTables
public void validateSSTables(@NotNull BulkWriterContext writerContext, @NotNull java.nio.file.Path outputDirectory, @Nullable java.util.Set<java.nio.file.Path> dataFilePaths)Validate SSTables. If dataFilePaths is null, it finds all sstables under the output directory of the writer and validates them- Parameters:
outputDirectory- output directory of the sstable writerwriterContext- bulk writer contextdataFilePaths- paths of sstables (data file) to be validated. The argument is nullable. When it is null, it validates all sstables under the output directory.
-
getTokenRange
public com.google.common.collect.Range<java.math.BigInteger> getTokenRange()
-
getOutDir
public java.nio.file.Path getOutDir()
-
fileDigestMap
public java.util.Map<java.nio.file.Path,Digest> fileDigestMap()
- Returns:
- a view of the file digest map
-
-