Skip to content
Merged
Show file tree
Hide file tree
Changes from 141 commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
73f0d69
Barebone
haiqi96 Jun 17, 2025
bc4c464
backup of progress
haiqi96 Jun 18, 2025
e6238a2
Backup for initial handling for streams
haiqi96 Jun 18, 2025
653f3dc
Backup for initial handling for streams
haiqi96 Jun 18, 2025
a269a97
fix
haiqi96 Jun 18, 2025
1742ff0
small update to use fancier syntax
haiqi96 Jun 18, 2025
efbb47b
fixes
haiqi96 Jun 18, 2025
5714945
linter yeah
haiqi96 Jun 18, 2025
b1fe0d4
Adding simple handler for reusing
haiqi96 Jun 19, 2025
a5f5a3b
commit to propogate change
haiqi96 Jun 19, 2025
53d7417
Add handler
haiqi96 Jun 19, 2025
ccd23dc
Fix mistakes in the handler logic
haiqi96 Jun 19, 2025
5bb3685
Update scheduler to handle logs of time.
haiqi96 Jun 19, 2025
bc688b9
Merge branch 'main' into retension_period
haiqi96 Jun 19, 2025
0c8c6f6
Refactor dataset related code
haiqi96 Jun 20, 2025
0d186e6
Refactor dataset related code
haiqi96 Jun 20, 2025
75ac0ff
further refactor
haiqi96 Jun 20, 2025
bb1e5f4
Linter
haiqi96 Jun 20, 2025
ba7cfe1
A few more fixes
haiqi96 Jun 20, 2025
68454c6
Linter fixes
haiqi96 Jun 20, 2025
5eccaaf
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jun 20, 2025
c1de746
missing fixes
haiqi96 Jun 20, 2025
f08802b
Merge remote-tracking branch 'origin/DatasetRefactor' into retension_…
haiqi96 Jun 20, 2025
d797198
Fix mistake
haiqi96 Jun 20, 2025
c5dc9b9
Merge remote-tracking branch 'origin/DatasetRefactor' into retension_…
haiqi96 Jun 20, 2025
8c39e77
actually fixing
haiqi96 Jun 20, 2025
ea4318e
Merge remote-tracking branch 'origin/DatasetRefactor' into retension_…
haiqi96 Jun 20, 2025
2c97441
Intermediate backup for archive retention
haiqi96 Jun 20, 2025
2eff448
Update
haiqi96 Jun 20, 2025
d570ab6
Linter again
haiqi96 Jun 20, 2025
06332f4
Merge remote-tracking branch 'origin/DatasetRefactor' into retension_…
haiqi96 Jun 20, 2025
d5e8e28
some renaming
haiqi96 Jun 20, 2025
8a79b9b
adding reminder for myself
haiqi96 Jun 23, 2025
8c77119
Fixing permissions
haiqi96 Jun 23, 2025
3a1afb2
Add batch deletion support
haiqi96 Jun 24, 2025
73d76ac
Linter + code clean up
haiqi96 Jun 24, 2025
5745e65
More refactor
haiqi96 Jun 24, 2025
f3ba8b0
renaming
haiqi96 Jun 24, 2025
e566e74
Prepare for rearrangement
haiqi96 Jun 24, 2025
db9a508
Optimize logger
haiqi96 Jun 24, 2025
2f0f95a
Further refactor
haiqi96 Jun 25, 2025
e310102
Use asyncio
haiqi96 Jun 25, 2025
a332799
Refactoring
haiqi96 Jun 25, 2025
cb53857
Refactoring
haiqi96 Jun 25, 2025
85b7823
Update clp-config
haiqi96 Jun 25, 2025
398ab5e
Merge branch 'main' into DatasetRefactor
haiqi96 Jun 25, 2025
3209ddd
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jun 25, 2025
3c5b0e4
New line at eof
haiqi96 Jun 25, 2025
fb41607
Refactor retention cleaner name
haiqi96 Jun 25, 2025
386453b
Clean up
haiqi96 Jun 25, 2025
945c97b
linter
haiqi96 Jun 25, 2025
1845462
Adding more docstrings
haiqi96 Jun 25, 2025
bed13df
Temporarily remove stream retention
haiqi96 Jun 25, 2025
0d8d679
Linter
haiqi96 Jun 25, 2025
d40e773
Revert change for stream
haiqi96 Jun 25, 2025
7759a7a
Merge remote-tracking branch 'origin/main' into DatasetRefactor
haiqi96 Jun 27, 2025
271b8b3
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jun 27, 2025
e6b8cc7
Linter
haiqi96 Jun 27, 2025
c0b8563
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jun 27, 2025
7a468c3
Merge branch 'main' into DatasetRefactor
Bill-hbrhbr Jun 29, 2025
1dd1cea
Move default dataset metadata table creation to start_clp
Bill-hbrhbr Jun 29, 2025
a0c3c29
Remove unused import
Bill-hbrhbr Jun 29, 2025
a9bf615
Address review comments
Bill-hbrhbr Jun 30, 2025
fe05f5f
Replace the missing SUFFIX
Bill-hbrhbr Jun 30, 2025
39a9278
Move suffix constants from clp_config to clp_metadata_db_utils local …
Bill-hbrhbr Jun 30, 2025
7124828
Refactor archive_manager.py.
kirkrodrigues Jun 30, 2025
eb80992
Refactor s3_utils.py.
kirkrodrigues Jun 30, 2025
5ed44e7
compression_task.py: Fix typing errors and minor refactoring.
kirkrodrigues Jun 30, 2025
af6b508
compression_scheduler.py: Remove exception swallow which will hide un…
kirkrodrigues Jun 30, 2025
67fb01f
Refactor query_scheduler.py.
kirkrodrigues Jun 30, 2025
d6ad4de
clp_metadata_db_utils.py: Minor refactoring.
kirkrodrigues Jun 30, 2025
ff7d700
clp_metadata_db_utils.py: Rename _generic_get_table_name -> _get_tabl…
kirkrodrigues Jun 30, 2025
7ffc77c
clp_metadata_db_utils.py: Alphabetize new public functions.
kirkrodrigues Jun 30, 2025
0255cbd
clp_metadata_db_utils.py: Reorder public and private functions for co…
kirkrodrigues Jun 30, 2025
1076a3f
initialize-clp-metadata-db.py: Remove changes unrelated to PR.
kirkrodrigues Jun 30, 2025
71c4d82
Move default dataset creation into compression_scheduler so that it r…
kirkrodrigues Jun 30, 2025
6bd9372
Apply suggestions from code review
kirkrodrigues Jul 1, 2025
84df2e2
Merge branch 'main' into DatasetRefactor
kirkrodrigues Jul 1, 2025
983bea1
Remove bug fix that's no longer necessary.
kirkrodrigues Jul 1, 2025
bdb7817
Fix bug where dataset has a default value instead of None when using …
Bill-hbrhbr Jul 1, 2025
a82a267
Correctly feed in the input config dataset names
Bill-hbrhbr Jul 1, 2025
f699496
Remove unnecessary changes
Bill-hbrhbr Jul 1, 2025
94e8ca1
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jul 2, 2025
90ce0a4
Update the webui to pass the dataset name in the clp-json code path (…
kirkrodrigues Jul 2, 2025
d6f9e5a
Move dataset into the user function
haiqi96 Jul 2, 2025
dc6a706
Merge branch 'DatasetRefactor' of https://github.com/haiqi96/clp_fork…
haiqi96 Jul 2, 2025
76bcb4a
Remove unnecessary f string specifier
haiqi96 Jul 2, 2025
a4e6f83
Apply suggestions from code review
haiqi96 Jul 2, 2025
3c53cb0
Merge branch 'DatasetRefactor' into retension_period
haiqi96 Jul 2, 2025
66eba87
Polishing
haiqi96 Jul 2, 2025
7b42568
Add import type.
kirkrodrigues Jul 2, 2025
097e47c
Polishing more
haiqi96 Jul 3, 2025
8dc8e26
try adding query job handling
haiqi96 Jul 3, 2025
afe43ce
Merge branch 'main' into DatasetRefactor
haiqi96 Jul 3, 2025
85a3164
Merge remote-tracking branch 'origin/DatasetRefactor' into retension_…
haiqi96 Jul 3, 2025
af75118
Merge remote-tracking branch 'origin/main' into retension_period
haiqi96 Jul 3, 2025
e5e90f7
Fix wrong order
haiqi96 Jul 3, 2025
bac6767
Linter
haiqi96 Jul 3, 2025
de1c334
submit not-fully-tested-code
haiqi96 Jul 3, 2025
9fdb3d5
Apply suggestions from code review
haiqi96 Jul 4, 2025
2245244
Update components/job-orchestration/job_orchestration/retention/archi…
haiqi96 Jul 4, 2025
b1e5a2c
Apply suggestions from code review
haiqi96 Jul 4, 2025
4e93a30
Fix
haiqi96 Jul 4, 2025
6719872
Merge remote-tracking branch 'origin/main' into retension_period
haiqi96 Jul 4, 2025
f9fa626
nit fixes
haiqi96 Jul 4, 2025
450e16a
Update the logic to consider all running query jobs
haiqi96 Jul 17, 2025
ade2e27
Merge remote-tracking branch 'origin/main' into retension_period
haiqi96 Jul 17, 2025
f1584ff
linter
haiqi96 Jul 30, 2025
2c57dd6
Apply suggestions from code review
haiqi96 Aug 1, 2025
5f479c5
address code review concern
haiqi96 Aug 1, 2025
8c5fb89
Batch renaming
haiqi96 Aug 1, 2025
11e695f
Linter
haiqi96 Aug 1, 2025
1291c3f
Further refactor
haiqi96 Aug 1, 2025
f8c7369
Linter
haiqi96 Aug 1, 2025
9b48c9b
Apply suggestions from code review
haiqi96 Aug 4, 2025
6cff24d
Merge remote-tracking branch 'origin/main' into retension_period
haiqi96 Aug 4, 2025
c367c15
address review concern
haiqi96 Aug 4, 2025
e282020
Update logging
haiqi96 Aug 4, 2025
b93bb4b
Update components/job-orchestration/job_orchestration/garbage_collect…
haiqi96 Aug 4, 2025
390333f
Address review comments
haiqi96 Aug 5, 2025
a4546cf
Fix timezone
haiqi96 Aug 5, 2025
2c4821a
Apply suggestions from code review
haiqi96 Aug 7, 2025
9d5d087
Address code review comments and slight improved logging.
haiqi96 Aug 7, 2025
74af600
Linter
haiqi96 Aug 7, 2025
5f4f1e3
Add docs
haiqi96 Aug 8, 2025
c8a919c
Apply suggestions from code review
haiqi96 Aug 10, 2025
c911ccc
Apply suggestions from code review
haiqi96 Aug 10, 2025
a02ed22
Apply suggestions from code review
haiqi96 Aug 10, 2025
17defe5
Update
haiqi96 Aug 10, 2025
f66b378
slight update
haiqi96 Aug 10, 2025
34a52f3
Add empty line at eof
haiqi96 Aug 11, 2025
d9a3d09
Merge remote-tracking branch 'origin/main' into retention_readme
haiqi96 Aug 12, 2025
c8b2500
Merge branch 'main' into retention_readme
haiqi96 Aug 13, 2025
e3ff836
Update multi-node doc
haiqi96 Aug 15, 2025
8b2631a
Add section for non UTC timestamp
haiqi96 Aug 15, 2025
3b4f74f
Merge remote-tracking branch 'origin/main' into retention_readme
haiqi96 Aug 15, 2025
434a5ae
Apply suggestions from code review
haiqi96 Aug 16, 2025
66bfe4b
Reordering
haiqi96 Aug 16, 2025
d86f8fd
Apply suggestions from code review
haiqi96 Aug 16, 2025
562bffe
Merge branch 'retention_readme' of https://github.com/haiqi96/clp_for…
haiqi96 Aug 16, 2025
91d468e
Address code review comments
haiqi96 Aug 16, 2025
7f0c92e
Apply markdown lint configs.
kirkrodrigues Aug 19, 2025
0931335
Merge branch 'main' into retention_readme
kirkrodrigues Aug 19, 2025
e993809
Revise docs.
kirkrodrigues Aug 20, 2025
d2800cf
Apply suggestions from code review
kirkrodrigues Aug 20, 2025
9f85fd3
Add line to card
quinntaylormitchell Aug 20, 2025
9654d87
Properly format one of the times, and remove endline blankspace
quinntaylormitchell Aug 20, 2025
edc2b35
Apply suggestions from code review
kirkrodrigues Aug 20, 2025
21e2066
Rephrase expiry criteria formua.
kirkrodrigues Aug 20, 2025
02eb9aa
Minor edits and add details about search results retention.
kirkrodrigues Aug 20, 2025
15df283
Fix the example and make it more readable.
kirkrodrigues Aug 20, 2025
fd6f121
Haiqi suggestion.
kirkrodrigues Aug 20, 2025
1596cc6
Merge branch 'main' into retention_readme
haiqi96 Aug 20, 2025
0d8a48e
Address review comments and also fix the order presto card.
haiqi96 Aug 20, 2025
15ba738
Apply suggestions from code review
haiqi96 Aug 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/src/user-guide/guides-multi-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ worker components. The tables below list the components and their functions.
| query_scheduler | Scheduler for search/aggregation jobs |
| results_cache | Storage for the workers to return search results to the UI |
| webui | Web server for the UI |
| garbage_collector | Background process for retention control |
:::
Comment on lines +34 to 35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add cross-link for discoverability to the new component row

Point readers from the components table directly to the new guide.

-| garbage_collector     | Background process for retention control                        |
+| garbage_collector     | Background process for retention control; see [Retention control](guides-retention) |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| garbage_collector | Background process for retention control |
:::
| garbage_collector | Background process for retention control; see [Retention control](guides-retention) |
:::
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-multi-node.md around lines 34–35, the components
table row for "garbage_collector" should include a cross-link to the new guide
for discoverability; update the table cell to make the component name or its
description a Markdown link pointing to the new guide (for example the relative
path docs/src/user-guide/guides-garbage-collector.md or the correct guide
filename), ensuring the link text remains clear (e.g., "garbage_collector —
Background process for retention control") and the link URL points to the new
guide.


:::{table} Worker components
Expand Down Expand Up @@ -71,6 +72,8 @@ Running additional workers increases the parallelism of compression and search/a
4. Set `archive_output.directory` to a directory on the distributed filesystem.
* Ideally, the directory should be empty or should not yet exist (CLP will create it) since
CLP will write several files and directories directly to the given directory.
5. (Optional) Configure retention periods for archives and search results. See
[retention control](guides-retention) for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit out of scope of this doc, right?


5. Download and extract the package on all nodes.
6. Copy the `credentials.yml` and `clp-config.yml` files that you created above and paste them
Expand All @@ -93,6 +96,7 @@ but all components in a group must be started before starting a component in the

* `compression_scheduler`
* `query_scheduler`
* `garbage_collector`

**Group 3 components:**

Expand Down
6 changes: 6 additions & 0 deletions docs/src/user-guide/guides-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,10 @@ Multi-node deployment
^^^
How to deploy CLP across multiple nodes.
:::

:::{grid-item-card}
:link: guides-retention
Retention control
How to configure retention control for CLP.
:::
::::
167 changes: 167 additions & 0 deletions docs/src/user-guide/guides-retention.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Retention control in CLP

CLP supports retention control to free up storage space by periodically deleting outdated archives
and search results. Retention applies to both the local filesystem and object storage.

This process is managed by background **garbage collector** jobs, which scan for and delete expired
data based on configured retention settings in `etc/clp-config.yml`.

:::{note}
By default, retention control is disabled, and CLP retains data indefinitely.
:::

---

## Definitions
This section explains the terms and criteria CLP uses to decide when data should be deleted.

At a high level, CLP compares a data item's timestamp with the current time to determine whether
it has expired. The criteria used to assess this expiration differs slightly between archives and
search results.

### Terms
- **Current Time (`T`):** The current time (UTC) when a garbage collector job evaluates data
expiration.
- **Retention Period (`TTL`):** The configured duration for which CLP retains data before it is
considered expired.
- **Archive timestamp (`archive.T`):** The most recent timestamp among all log messages
contained in the archive. Not related to the time at which the logs were compressed.

Note that logs with outdated timestamps may be deleted immediately, depending on your retention
settings.
- **Search result timestamp (`search_result.T`):** The timestamp when a search result is inserted
into the results_cache.

:::{Note}
Archives whose log messages do not contain timestamps are not subject to retention.
:::

### Expiry criteria

- **Archive Expiry:**
An archive is considered expired if its retention period has elapsed since archive's timestamp,
i.e. that the difference between `T` and `archive.T` has surpassed `TTL`.
```text
if (T - archive.T > TTL) then EXPIRED
```

For example, if a compressed archive has `archive.T = 16:00`(for simplicity,
we omit dates and seconds from the timestamp), and `TTL = 1 hour`, it will be
considered expired after `T = 17:00` since `T - 16:00 > 1:00` for all `T > 17:00`.

:::{caution}
Retention control assumes that archive timestamps are given in **UTC** time. Using retention
control on archives with local (i.e., non-UTC) timestamps can lead to an effective `TTL` that is
different from the intended value.

In the example above, if the package operates on a system in EDT (UTC-4) and `archive.T = 16:00`
is a local timestamp, then a garbage collection job operating at 16:30 local time will convert
`16:30 EDT` to `20:30 UTC`, and the expiry calculation will be `20:30 - 16:00 > 1:00`. In this
case, the archive would be considered expired, and would be deleted, even though it wouldn't have
actually reached its intended retention period.

To avoid this issue, either generate logs with UTC timestamps or adjust the retention period to
account for the offset:

`adjusted_retention_period = retention_period - signed_UTC_offset`
:::

- **Search Result Expiry:**
A search result is considered expired if its retention period has elapsed since the search was
completed, i.e. that the difference between T and `search_result.T` has surpassed TTL.
```text
if (T - search_result.T > TTL) then EXPIRED
```

---

## Configuration
CLP allows users to specify different **retention_periods** for different types of data.
Additionally, the frequency of garbage collection job execution for each type of data can be
configured to a customized **sweep_interval**. These settings can be configured in
`etc/clp-config.yml`.

### Configure retention period
To configure a retention period, update the appropriate `.retention_period` key in
`etc/clp-config.yml` with the desired retention period in minutes.

For example, to configure an archive retention period of 30 days (43,200 minutes):
```yaml
archive_output:
# Other archive_output settings

# Retention period for archives, in minutes.
# Set to null to disable automatic deletion.
retention_period: 43200
```
Similarly, to configure a search result retention period of 1 day (1440 minutes):
```yaml
results_cache:
# Other results_cache settings

# Retention period for search results, in minutes.
# Set to null to disable automatic deletion.
retention_period: 1440
```
### Configure sweep interval
The **`garbage_collector.sweep_interval`** parameter specifies the time interval at which garbage
collector jobs run to collect and delete expired data.

To configure a custom sweep frequency for different retention targets, you can set the subfields
under `garbage_collector.sweep_interval` individually in `etc/clp-config.yml`. For example, to
configure a sweep interval of 15 minutes for search results and 3 hours (180 minutes) for archives,
enter the following:

```yaml
garbage_collector:
logging_level: "INFO"
# Interval (in minutes) at which garbage collector jobs run
sweep_interval:
archive: 180
search_result: 15
```

:::{note}
If the `.retention_period` for a data type is set to `null`, the corresponding garbage collection
task will not run even if `garbage_collector.sweep_interval.<datatype>` is configured.
:::

---

## Internal

This section documents some of CLP’s internal behavior for retention and garbage collection.

### Handling data race conditions
CLP's retention system is designed to avoid data race conditions that may arise from the deletion of
archives or search results that may still be in use by active jobs. CLP employs the following
mechanisms to avoid these conditions:

- If any query job is running, CLP conservatively calculates a **safe expiry timestamp** based on
the earliest active search job. This ensures no archive that could be searched is deleted.

- CLP will **not** search an archive once it is considered expired, even if it has not yet been
deleted by the garbage collector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a "warning" for this?
Alternatively, can we mark the job with a non-successful status so that users can be aware that some archives are skipped? Otherwise, it would appear odd that we list a certain number of events on the WebUI ingestion page but none of them show up in the search results. (Or maybe we should also exclude the stats of those archives on the ingestion page?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we need to discuss the actual behavior so that we are on the same page, but I do think that "excluding the stats of those archives on the ingestion page" sounds like a valid suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davemarco @hoophalab How hard would it be to add this today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level, here is what we need to implement

  1. Webui should take in archive_output.retention_period in as an optional int. (can be null)
  2. When fetching ingested log info, if the retention_period is not null
    • get the current UTC timestamp
    • reverse calculate the lower_bound = UTC_ts - retention_period * 60 (coverting mins to secs)
    • Append a filter to the SQL query select <columns> from <existing conditions> and end_timestamp >= {archive_end_ts_lower_bound} OR end_timestamp = 0

Similar code is implemented in here under the query_scheduler.py

Copy link
Contributor

@davemarco davemarco Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add an expired bool to the metadata tables, then we can just query for expired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the future release. I don't think it's at the level where we can finish planning, implementing and testing in one day though.

Copy link
Contributor

@davemarco davemarco Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get code in today, but someone will need to review and test. I guess its up to @kirk whether to put in this release, or fix after

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put up the PR and then we can evaluate? Either way, we'll need this feature unless we switch to some new API server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96 @kirkrodrigues it is actually slightly more complicated for the files table since multiple files may be part of the same archive and have different end timestamps. Is there an easy around this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what if you do a join statement?
like
select * from File
join Archive_table on File.archive_id = Archive_table.id
Where Archive_table.end_ts > lower_bound?

Alternatively, you can execute two queries:

  1. select archive_ids from Archive_table where Archive_table.end_ts > lower_bound
  2. Select files from Files_table where Files_table.archive_id in archive_ids

The first option could be easier to write though.


:::{warning}
A hanging search job will prevent CLP from deleting expired archives.
Restarting the query scheduler will mark such jobs as failed and allow garbage collection to resume.
:::

### Fault tolerance
The garbage collector can resume execution from where it left off if a previous run fails.
This design ensures that CLP does not fall into an inconsistent state due to partial deletions.

If the CLP package stops unexpectedly while a garbage collection task is running (for example, due
to a host machine shutdown), simply restart the package and the garbage collector will continue from
the point of failure.

:::{note}
During failure recovery, there may be a temporary period during which an archive no longer exists in
the database, but still exists on disk or in object storage. Once recovery is complete, the physical
archive will also be deleted.
:::
1 change: 1 addition & 0 deletions docs/src/user-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ guides-overview
guides-using-object-storage/index
guides-multi-node
guides-using-presto
guides-retention
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this above guides-multi-node? Same for the card in the overview.

:::

:::{toctree}
Expand Down