-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Description
Affected Version
35.0.0
Description
Please include as much detailed information about the problem as possible.
- Cluster size: Single server small
- Any debugging that you have already done
I'm ingesting data from using local input source, JSON format, with segmentGranularity=hour
each batch results in a segment partition for the hour. About 10-30 in each hour, about 60k rows in each partition.
I created a auto-compaction to consolidate into segments with 4H interval.
Here is the config JSON:
{
"dataSource": "data",
"skipOffsetFromLatest": "PT4H",
"tuningConfig": {
"partitionsSpec": {
"type": "single_dim",
"partitionDimension": "hostHeader",
"targetRowsPerSegment": 3000000,
"assumeGrouped": false
},
"type": "index_parallel"
},
"granularitySpec": {
"segmentGranularity": {
"type": "period",
"period": "PT4H"
},
"rollup": false
},
"ioConfig": {
"dropExisting": true
}
}
What I found is the skipOffsetFromLatest seems to be ignored. The compact task always starts right after each 4 hour boundary for the previous 4-hour interval. For example, the new segment 04:00:00/08:00:00 is generated at 08:06:00. There is no wait time. I tried to set skipOffsetFromLatest at different values but made no difference. I'm very sure that there is no rogue timestamps in the data. At 08:06:00, the latest segment has an end time 09:00:00.
However, the datasource tab of the console correctly shows: "Fully compacted (except the last PT4H of data, 12 segments skipped)"
Any idea why?