Replies: 5 comments 28 replies
-
|
cc @PragmaTwice |
Beta Was this translation helpful? Give feedback.
-
|
Some comments:
|
Beta Was this translation helpful? Give feedback.
-
|
I have a few questions
|
Beta Was this translation helpful? Give feedback.
-
|
Sorry for late replying because I'm a bit busy these days. What would agg does when |
Beta Was this translation helpful? Give feedback.
-
|
(Sorry again for late replying, I'll take a careful pass when I have time, now I only have time to read this when I'm not working...) I have a quick first scan at the spec, and there're some thoughts:
Thanks again for the proposal |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is an OSPP 2025 project. I sincerely appreciate and welcome feedback and suggestions from all community members to help me improve!
Introduction
RedisTimeSeries is a redis module used to operate and query time series data, giving redis basic time series database capabilities.
As Apache Kvrocks is characterized by being compatible with the Redis protocol and commands, we also hope to provide temporal data processing capabilities that are compatible with RedisTimeSeries.
Data Structure Design
Most of the design is inspired by beihao's proposal and PR #2573, but I've added many details and made significant changes to the core data storage approach.
Key Type(1 byte emun)
CHUNKLABELDOWNSTREAM1.
Datameta(Metadata CF)retentionTime: Maximum age (in milliseconds) for samples compared to the latest timestamp. A value of0disables retention.chunkSize: The preferred number of samples per data chunk.chunkType: The storage format of the chunk (compressed or uncompressed).duplicatePolicy: An enum represents the policy to handle samples with duplicate timestamps (e.g., BLOCK, FIRST, LAST).sourceKey: If this series is a downstream target for compaction, this field stores the key of the source series.lastTimestamp: An approximate latest timestamp in the time series.2. Primary Data Storage (Time Chunks)
This section explains how timestamp and value data are stored. In RedisTimeSeries, both compressed and uncompressed storage methods are supported. Below is the corresponding design in KvRocks:
Uncompressed Chunk Type
Data storage (default CF):
Uncompressed chunks store raw timestamps and values as
uint64anddoubletypes, respectively.Compressed Chunk Type
Data storage (default CF):
timestamp: Delta-of-delta encoding.value: Facebook Gorilla.3. Secondary Indexes
Label Index (default cf)
TS.INFO.TS.ALTERReverse Index (TimeSeries CF)
Since querying multiple time series using label indexing is a very common operation, designing an efficient reverse index is crucial. Given that the key encoding format of the reverse index differs significantly, it is necessary to introduce a new Column Family (
TimeSeries).TS.MRANGEDownstream Key Datameta (default cf)
rulesfield inTS.INFO.aggregatoris an enum for aggregation methods supported byTS.CREATERULE. The meanings ofaggregator,bucket_duration, andalignmentare defined in theRediscommand.latest_bucket_indexhelps quickly determine if it's the latestbucketwhen adding data downstream.auxInfo: Supports various aggregation types inRedisand helps downstream series update quickly with lower CPU overhead(see Quickly Update Latest Bucket).Key Operation Workflows
Writing
datametaand itslastTimestampfield.chunk_idfromtimestampusing:chunk_id = timestamp // chunk_sizelastTimestamp<timestamp): Append data to the corresponding chunk(chunk_id) . For compressed types, this can quickly compute compressed values usingchunk datametaand append to chunk end, reducing CPU usage; uncompressed types insert directly.lastTimestamp>timestamp): For uncompressed types, use binary search to locate insertion position; compressed types need to reconstruct the entire chunk.datametaandchunk datameta.Optimizations for Compressed Data Writes
There are two distinct patterns for time series writes that require different handling. For source series, new data points are typically "appended" to chunks, while downstream series primarily involve "updating" the latest data points. Compression algorithms (Delta-of-delta encoding and Facebook Gorilla) are both differential-based algorithms. We can implement different approaches tailored to these two scenarios.
"Append" mode: Updates
prev_timestamp,prev_timestampDelta,prev_valueduring writes."Update" mode:
prev_timestampandprev_valuepoint to the second-newest data, allowing direct updates of the latest data.last_offsetfield in Downstream Key Datameta helps quickly locate the latest data point for "update" mode optimizations.Time Range Query (
TS.RANGE)start_tstoend_tsto ensure the query range hasn't expired.chunk_idrange fromstart_tstoend_ts.ns_key|version|*.Multi-Series Query (
TS.MRANGE)ts_keycandidates via label index. See Reverse IndexTS.RANGEon each candidate.Passing Data Downstream (
TS.CREATERULE)When a new data point is added to the source series (e.g., (
ts,value)):Check
downstream_keyand getaggregator,bucket_duration,alignment,latest_bucket_indexvia [Downstream Key Datameta](#Downstream Key Datameta).Calculate
bucket_index:(ts-align)//bucket_duration, and aligned timets':ts' = bucket_index*bucket_duration+alignbucket_index>latest_bucket_index: Directly add a new data point (ts',value) downstream.bucket_index=latest_bucket_index: Most frequent case, see Quickly Update Latest Bucket.bucket_index<latest_bucket_index:MINand new value > current value.AVG.Write result to
downstream_keyatomically, consistent with Writing.Quickly Update Latest Bucket
This is the most common case for downstream time series data writes, corresponding to the "update" mode in Optimizations for Compressed Data Writes. The auxiliary information in Downstream Key Datameta helps determine updated values without fetching from source, reducing I/O and CPU since
bucket_durationmay be much larger thanchunk_size.Specifically, we store current bucket statistics for different aggregation types:
AVGupdate_avg=(cur_avg_value*count+new_value)/(count+1)SUMnew_sum=cur_sum+new_valueMINnew_min_value=new_value<cur_min_value?new_value:cur_min_valueMAXMINRANGEcur_min_value和cur_max_value,thennew_range = cur_max_value - cur_min_valueFIRSTnew_timestamp<first_timestampLASTFIRSTstd.pstd.svar.pvar.sRetention (
[RETENTION retentionPeriod])Use
RocketDBcompact filters to delete expired data. When all data in achunkexpires, thechunkcan be deleted (whenchunk_last_timestamp+retention<lastTimestamp). First delete expiredchunk datameta, then the correspondingchunk.Queries automatically filter data outside the retention window.
Command Support Status
This section outlines the core commands.
RedisTimeSeriesCommandsTS.ADD/TS.MADD[DUPLICATE_POLICY policy]TS.CREATE[ON_DUPLICATE policy_ovr][LABELS [label value ...]][RETENTION retentionPeriod][ENCODING][CHUNK_SIZE size][LABELS [label value ...]]TS.DELTS.MRANGE/TS.RANGE[FILTER_BY_TS ts...][FILTER_BY_VALUE min max][COUNT count][[ALIGN align] AGGREGATION aggregator bucketDuration [BUCKETTIMESTAMP bt] [EMPTY]]FILTER filterExpr...TS.CREATERULEAGGREGATION aggregator bucketDuration[alignTimestamp]TS.INCRBY/TS.DECRBYTS.INFOmemoryUsagefirstTimestampchunkSizeother...❓indicates fields that may be incompatible or inconsistent with
Redisimplementation.TS.INFOField CompatibilitySome fields may not align with our design:
chunkSize: InRedis, this is a fixed byte size specified at creation.memoryUsage: This metric might be challenging to calculate and maintain accurately. As an alternative, we could potentially usediskUsageinstead, with support from thechunk datametastructure ?Beta Was this translation helpful? Give feedback.
All reactions