package archive
Provides abstraction and tools for compressing/archiving time series data. The compression used is Gorilla TSC encoding implemented by the Java library fi.iki.yak.ts.compression.gorilla.
These are the main types:
- io.sqooba.oss.timeseries.archive.GorillaArray: a simple byte array that represents a compressed/encoded map of (timestamp, value) tuples.
- io.sqooba.oss.timeseries.archive.GorillaBlock: the representation of a compressed/encoded TimeSeries as defined in this library.
- io.sqooba.oss.timeseries.archive.GorillaSuperBlock: groups many GorillaBlocks of the same TimeSeries into a larger binary format. This allows compression of a timeseries that spans a very long range of time.
- io.sqooba.oss.timeseries.archive.MultiSeriesBlock: a large block of many indexed GorillaSuperBlocks. This format can be used to store all the data of a certain time period into one single binary blob.
- Note
The only supported type for the values of the TSEntries at the moment is Double. This can lead to precision problems if you have very high long values that you convert to double and pass to the compression.
- Alphabetic
- By Inheritance
- archive
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- type GorillaArray = Array[Byte]
Represents a gorilla encoded series of (timestamp, value) tuples, without validities.
Represents a gorilla encoded series of (timestamp, value) tuples, without validities. It is just an array of bytes.
- trait GorillaBlock extends AnyRef
A GorillaBlock represents a compressed/encoded TimeSeries as defined in this library.
A GorillaBlock represents a compressed/encoded TimeSeries as defined in this library. It is the unit of compression/decompression for series data.
- case class GorillaSuperBlock(channel: SliceableByteChannel) extends Product with Serializable
The GorillaSuperBlock class lazily wraps a channel that it reads from.
The GorillaSuperBlock class lazily wraps a channel that it reads from. This facilitates the reading of the binary format. The block does not perform any reading on the channel unless a method explicitly requires it.
- channel
to read from upon method invocation
- case class MultiSeriesBlock(channel: SliceableByteChannel) extends Product with Serializable
The MultiSeriesBlock class just lazily wraps a channel that it reads from.
The MultiSeriesBlock class just lazily wraps a channel that it reads from. This facilitates the reading of the binary format. The block does not perform any reading on the channel unless a method explicitly requires it.
- channel
to read from upon method invocation
- case class SampledGorillaBlock extends GorillaBlock with Product with Serializable
GorillaBlock for series that have mostly similar validities.
GorillaBlock for series that have mostly similar validities. This can be stored more efficiently with a single GorillaArray. After decompression all entries will have the validity of the sample rate. This rate may include a bit of margin for jitter because individual validities will be trimmed at if they overlap once they are put into a 'TimeSeries'.
- case class TupleGorillaBlock extends GorillaBlock with Product with Serializable
Standard implementation of the GorillaBlock that has one GorillaArray for the values and one for the validities of the timeseries.
Value Members
- object GorillaArray
Provides methods to construct and parse a GorillaArray.
- object GorillaBlock
- object GorillaSuperBlock extends Serializable
A GorillaSuperBlock is a binary format for storing a long io.sqooba.oss.timeseries.TimeSeries composed of many GorillaBlocks.
A GorillaSuperBlock is a binary format for storing a long io.sqooba.oss.timeseries.TimeSeries composed of many GorillaBlocks. It uses a sequential layout with an index to permit quick look-up of individual blocks by timestamp.
GorillaSuperBlock Format +-------+-------+ ... +----- -+-------+---+----------+---+
block
block
block
Index
L
Thrift
L
+-------+-------+ ... +-------+-------+---+----------+---+
LT: length of Thrift block, LI: length of index
The Gorilla blocks are lined up one after another, ordered by timestamp. They all have different lengths that can be recovered from the offsets stored in the index.
At the end of the stream there is a footer composed of an additional Gorilla array containing the index and a block of thrift managed metadata. The index maps the start timestamp of each GorillaBlock to its offset from the beginning of the blob (in bytes). The last entry in the index points to the start of the footer.
The lengths of the index and of the thrift block are written as 4 byte integers just after the respective blocks. In that manner a user can first read the footer, decode the index and then do fast look-ups for blocks by timestamp.
- object MultiSeriesBlock extends Serializable
A MultiSeriesBlock groups multiple GorillaSuperBlocks in an indexed format.
A MultiSeriesBlock groups multiple GorillaSuperBlocks in an indexed format. All the SuperBlocks are concatenated and there is a footer managed by thrift that contains the index and optionally names/string-keys for each SuperBlock.
The start and the end of the blob are marked with the 4-byte magic number 'STS\n': 53 54 53 0a that stands for "super time series".
GorillaSuperBlock Format +---+-------+-------+ ... +----- -+----------+---+---+
S
super
super
super
Thrift
L
S
S
0
1
N
S
+---+-------+-------+ ... +-------+----------+---+---+
LF: length of Thrift footer, STS: magic number