Skip to content

feat: Add archive & retention policy w/ storage growth projection#535

Open
KaliszS wants to merge 4 commits intomainfrom
scale
Open

feat: Add archive & retention policy w/ storage growth projection#535
KaliszS wants to merge 4 commits intomainfrom
scale

Conversation

@KaliszS
Copy link
Collaborator

@KaliszS KaliszS commented Feb 27, 2026

You can find new settings in "Data Lifecycle" tab. Using that admin can now set up auto archiving data points series (it re-aggregates data into daily instead of batch-size chunks and move it into new table) and/or auto retention (it removes data from both live & archive tables). Both options allows to set up number of days after which policies are triggered (once per day).

This tab also includes prediction of how the storage size will grow with currently existing data.

Closes #489

image image image image image

Summary by CodeRabbit

  • New Features
    • Data Lifecycle management: configurable archival and retention policies to optimize storage.
    • Storage monitoring with detailed estimates and growth projections, including visual charting.
    • Automatic daily archival job with manual "Run Now" trigger from UI.
    • New Data Lifecycle settings tab to view storage, edit policies, validate conflicts, and run archival on demand.

@KaliszS KaliszS self-assigned this Feb 27, 2026
@KaliszS KaliszS added backend data model Tickets related to data model labels Feb 27, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Adds a data lifecycle system: new archival models, repositories, service, API endpoints, Celery task and schedule, migration and seed, frontend UI and hooks, and repository/service integrations to aggregate, archive, and retain historical time-series data as daily summaries.

Changes

Cohort / File(s) Summary
API Routes
backend/app/api/routes/v1/archival.py, backend/app/api/routes/v1/__init__.py
New developer-only router exposing GET/PUT archival settings (with storage estimates) and POST trigger endpoint; mounted on v1 router.
Celery tasks & schedule
backend/app/integrations/celery/tasks/archival_task.py, backend/app/integrations/celery/tasks/__init__.py, backend/app/integrations/celery/core.py
New shared_task run_daily_archival added and exported; Celery timezone set to UTC and beat_schedule includes daily run at 03:00 UTC.
DB models & migration
backend/app/models/archival_setting.py, backend/app/models/data_point_series_archive.py, backend/app/models/__init__.py, backend/migrations/versions/..._archive.py
Adds singleton archival_settings and data_point_series_archive tables with constraints and exports; migration creates/drops these tables and removes/restores a previous index.
Model adjustments
backend/app/models/data_point_series.py, backend/app/models/refresh_token.py, backend/app/models/user_invitation_code.py
Removed explicit index from data_point_series table args; moved Indexed wrappers to table-level Indexes for refresh_token and user_invitation_code; added revoked_at/created_at fields.
Repositories
backend/app/repositories/archival_repository.py, backend/app/repositories/__init__.py
New ArchivalSettingRepository and DataPointSeriesArchiveRepository with storage estimation, batch archiving/upsert, and delete routines; exported via repositories package.
Services
backend/app/services/archival_service.py, backend/app/services/__init__.py, backend/app/services/summaries_service.py
New ArchivalService orchestrating settings, storage computation, and run_daily_archival flow; summaries service now merges archived activity into results; archival_service exported.
Schemas & enums
backend/app/schemas/archival.py, backend/app/schemas/series_types.py, backend/app/database.py
New Pydantic schemas for settings and storage estimates; AggregationMethod enum and mapping added; DB type mapping for AggregationMethod registered.
DB helpers & scripts
backend/scripts/init/seed_archival_settings.py, backend/scripts/start/app.sh
Seed script to create singleton archival settings and start script runs it on startup.
Frontend: service, hooks, keys
frontend/src/lib/api/services/archival.service.ts, frontend/src/hooks/api/use-archival.ts, frontend/src/lib/query/keys.ts
New archival API client with types, React Query hooks for get/update/trigger, and query keys.
Frontend: UI
frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx, frontend/src/routes/_authenticated/settings.tsx
Adds Data Lifecycle settings tab with storage overview, growth projection chart, toggles/inputs for archival & retention, validation, and manual trigger UI.
Repository utils
backend/app/repositories/repositories.py
Refactored exists_any to use getattr(self.model, "id") for primary key detection.
Tests
backend/tests/conftest.py
Extended Garmin webhook mocks to simulate completed backfill and avoid progressing pending types.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Frontend
    participant API as FastAPI
    participant Service as ArchivalService
    participant RepoA as ArchivalSettingRepo
    participant RepoD as DataPointArchiveRepo
    participant DB as Database

    rect rgb(100,200,150,0.5)
        Client->>API: GET /api/v1/settings/archival
        API->>Service: get_settings(db)
        Service->>RepoA: get(db)
        RepoA->>DB: SELECT archival_settings
        DB-->>RepoA: setting
        Service->>RepoD: get_storage_estimate(db)
        RepoD->>DB: pg_catalog queries
        DB-->>RepoD: size metrics
        RepoD-->>Service: StorageEstimate
        Service-->>API: ArchivalSettingWithEstimate
        API-->>Client: JSON
    end

    rect rgb(150,150,200,0.5)
        Client->>API: POST /api/v1/settings/archival/run
        API->>Service: run_daily_archival(db)
        Service->>RepoD: archive_data_before(cutoff)
        RepoD->>DB: SELECT/INSERT/DELETE (batched)
        DB-->>RepoD: affected counts
        RepoD-->>Service: archived_rows
        Service->>RepoD: delete_archive_before(cutoff_retention)
        RepoD->>DB: DELETE old archives
        RepoD-->>Service: deleted_rows
        Service-->>API: summary
        API-->>Client: JSON (summary)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Linked issues

  • #489: Data aggregation level & Open Wearables usage at scale — This PR implements daily aggregation into an archive table and retention controls, addressing the objective to reduce per-sample storage by creating daily aggregates server-side.

Poem

🐇 I hopped through rows and rolled them tight,

Daily buckets humming through the night,
Old samples folded into one neat day,
The warren shrank where bytes would play,
Archive set, retention tuned — hop, hooray!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Changes to refresh_token.py, user_invitation_code.py, data_point_series.py, repositories.py, summaries_service.py, and conftest.py appear tangential to archival/retention feature but provide supporting infrastructure and schema improvements. Clarify whether index/schema changes to refresh_token, user_invitation_code, and data_point_series are prerequisites for the archival feature or represent unrelated optimizations that could be separated.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding archive and retention policies with a storage growth projection feature for the data lifecycle.
Linked Issues check ✅ Passed The PR implements archival and retention infrastructure addressing issue #489's storage optimization concerns by enabling daily aggregation, automatic archival, and data deletion policies [#489].

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch scale

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@railway-app railway-app bot temporarily deployed to Open Wearables / open-wearables-pr-535 February 27, 2026 14:12 Destroyed
@railway-app
Copy link

railway-app bot commented Feb 27, 2026

🚅 Deployed to the open-wearables-pr-535 environment in Open Wearables

Service Status Web Updated (UTC)
frontend ✅ Success (View Logs) Web Mar 5, 2026 at 12:03 pm
celery-beat ✅ Success (View Logs) Mar 5, 2026 at 12:02 pm
celery-flower ✅ Success (View Logs) Web Mar 5, 2026 at 12:02 pm
backend ✅ Success (View Logs) Web Mar 5, 2026 at 12:02 pm
celery-worker ✅ Success (View Logs) Mar 5, 2026 at 12:02 pm

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (8)
backend/tests/conftest.py (1)

218-218: Align variable naming with repository rule.

On Line [218], rename webhook_module to camelCase to match project convention for variables.

Proposed fix
-    webhook_module = "app.api.routes.v1.garmin_webhooks"
+    webhookModule = "app.api.routes.v1.garmin_webhooks"
...
-        patch(f"{webhook_module}.get_trace_id", return_value=None),
-        patch(f"{webhook_module}.mark_type_success", return_value=False),
+        patch(f"{webhookModule}.get_trace_id", return_value=None),
+        patch(f"{webhookModule}.mark_type_success", return_value=False),
...
-            f"{webhook_module}.get_backfill_status",
+            f"{webhookModule}.get_backfill_status",
...
-        patch(f"{webhook_module}.trigger_next_pending_type", return_value=None),
+        patch(f"{webhookModule}.trigger_next_pending_type", return_value=None),

As per coding guidelines, "Follow PEP 8 naming conventions: camelCase for variables, UPPER_SNAKE_CASE for constants, PascalCase for classes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/conftest.py` at line 218, The variable webhook_module should be
renamed to camelCase (webhookModule) to follow project conventions; rename the
identifier webhook_module to webhookModule in backend/tests/conftest.py and
update every reference/usage in that file (fixtures, tests, or helper calls) to
the new name so references like webhook_module -> webhookModule remain
consistent and tests import/use the updated symbol.
backend/app/schemas/series_types.py (1)

434-439: Fail fast when a series type has no aggregation mapping.

Defaulting to AVG can silently corrupt archival semantics for future cumulative metrics. Prefer explicit failure so mapping gaps are caught during development.

Proposed hardening
 def get_aggregation_method(series_type: SeriesType) -> AggregationMethod:
     """Get the aggregation method for a series type.
 
-    Falls back to AVG for unknown types (safe default for rate-like metrics).
+    Raises when no mapping exists to avoid silent mis-aggregation.
     """
-    return AGGREGATION_METHOD_BY_TYPE.get(series_type, AggregationMethod.AVG)
+    return AGGREGATION_METHOD_BY_TYPE[series_type]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/schemas/series_types.py` around lines 434 - 439, Replace the
silent default in get_aggregation_method with an explicit failure: check
AGGREGATION_METHOD_BY_TYPE for the provided series_type inside
get_aggregation_method (rather than using .get(..., AggregationMethod.AVG)), and
raise a clear exception (e.g., ValueError or KeyError) that includes the
SeriesType value when no mapping exists; keep the function signature and return
type but remove the fallback to AggregationMethod.AVG so missing mappings fail
fast during development.
frontend/src/lib/api/services/archival.service.ts (1)

49-84: Optional cleanup: remove duplicated error wrapping around apiClient calls.

apiClient already normalizes non-ApiError failures into ApiError.networkError, so these try/catch blocks repeat central behavior. Flattening this service reduces noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/api/services/archival.service.ts` around lines 49 - 84,
Remove the duplicated try/catch wrappers around apiClient calls in the methods
getSettings, updateSettings, and triggerArchival and let apiClient propagate its
normalized ApiError; specifically, replace each async method body so it directly
returns await apiClient.get/put/post with the same generic types and endpoints
(keep method names and signatures: getSettings, updateSettings, triggerArchival)
rather than catching and re-wrapping errors, thereby flattening the service code
and relying on apiClient's centralized error normalization.
backend/app/schemas/archival.py (1)

59-59: Constrain growth_class to a literal union instead of free-form str.

Line 59 currently allows any string. Tightening this field improves validation and keeps backend/frontend contracts aligned.

🔒 Suggested schema tightening
+from typing import Literal
@@
-    growth_class: str = Field(description="Growth complexity class: 'bounded' | 'linear_efficient' | 'linear'")
+    growth_class: Literal["bounded", "linear_efficient", "linear"] = Field(
+        description="Growth complexity class"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/schemas/archival.py` at line 59, The growth_class field on the
Pydantic model (growth_class: str = Field(...)) is too permissive—change its
type to a Literal union to enforce allowed values (Literal['bounded',
'linear_efficient', 'linear']), import Literal from typing (or typing_extensions
if needed), and keep the existing Field(description=...) on growth_class so
validation and schema generation reflect the constrained enum-like choices.
frontend/src/hooks/api/use-archival.ts (1)

21-33: Add onMutate + rollback to useUpdateArchivalSettings for immediate UI feedback.

Line 21-33 currently waits for server success before local state updates. This hook can follow the project’s optimistic mutation pattern to avoid a stale-feeling settings form.

💡 Suggested refactor
 export function useUpdateArchivalSettings() {
   const queryClient = useQueryClient();

   return useMutation({
     mutationFn: (data: ArchivalSettingsUpdate) =>
       archivalService.updateSettings(data),
+    onMutate: async (next) => {
+      await queryClient.cancelQueries({
+        queryKey: queryKeys.archival.settings(),
+      });
+      const previous = queryClient.getQueryData(queryKeys.archival.settings());
+      queryClient.setQueryData(queryKeys.archival.settings(), (current: any) =>
+        current
+          ? {
+              ...current,
+              settings: {
+                ...current.settings,
+                ...next,
+              },
+            }
+          : current
+      );
+      return { previous };
+    },
     onSuccess: async () => {
       await queryClient.invalidateQueries({
         queryKey: queryKeys.archival.all,
       });
       toast.success('Data lifecycle settings updated');
     },
-    onError: (error) => {
+    onError: (error, _vars, context) => {
+      if (context?.previous) {
+        queryClient.setQueryData(
+          queryKeys.archival.settings(),
+          context.previous
+        );
+      }
       toast.error(`Failed to update settings: ${getErrorMessage(error)}`);
     },
   });
 }

Based on learnings: Applies to frontend/src/hooks/api/**/*.ts : Use optimistic updates in mutations with onMutate for immediate UI feedback and onError rollback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/hooks/api/use-archival.ts` around lines 21 - 33, The mutation in
useUpdateArchivalSettings (useMutation calling archivalService.updateSettings)
lacks an onMutate optimistic update and rollback; add onMutate that cancels
queries for queryKeys.archival.all, snapshots the previous cache state, applies
the updated ArchivalSettings to the query cache for immediate UI feedback,
return the snapshot as context; modify onError to restore the previous snapshot
from context to rollback if the server call fails; keep invalidateQueries in
onSuccess or onSettled to refetch authoritative data.
backend/app/services/archival_service.py (2)

27-31: Add explicit return type annotation to __init__.

Per coding guidelines, all functions must have explicit return type annotations.

Proposed fix
-    def __init__(self, log: Logger):
+    def __init__(self, log: Logger) -> None:
         self.logger = log
         self.settings_repo = ArchivalSettingRepository()
         self.archive_repo = DataPointSeriesArchiveRepository()

As per coding guidelines, "Type hints are required on all functions with explicit parameter and return type annotations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/archival_service.py` around lines 27 - 31, The __init__
method in ArchivalService is missing an explicit return type; update the
constructor signature of def __init__(self, log: Logger) to include an explicit
return type annotation (-> None) so it reads def __init__(self, log: Logger) ->
None:, leaving body unchanged (self.logger, self.settings_repo =
ArchivalSettingRepository(), self.archive_repo =
DataPointSeriesArchiveRepository()); ensure any required imports for typing are
present if your linter/enforcer requires them.

56-70: Consider adding @handle_exceptions for consistency or using a TypedDict return type.

The run_daily_archival method lacks the @handle_exceptions decorator that other public methods use. While the Celery task has its own try/except, this creates inconsistency. The return type dict could also be more specific.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/archival_service.py` around lines 56 - 70, The public
method run_daily_archival should be decorated with the existing
`@handle_exceptions` decorator for consistency with other service methods and to
centralize error handling; add `@handle_exceptions` above the run_daily_archival
method definition and ensure imports/reference to handle_exceptions are present;
also replace the broad return type dict with a specific TypedDict (e.g.,
ArchivalSummary with keys archived_rows, deleted_rows, deleted_live_rows) and
use that type for the method signature and the summary variable to make the
return shape explicit (update any callers if needed).
backend/app/repositories/archival_repository.py (1)

345-360: Inconsistent else_ clauses in case statements.

The distance_sum and flights_climbed_sum case statements are missing else_=0 or else_=None unlike the other aggregations. While the post-processing handles None, this inconsistency reduces readability.

Proposed fix for consistency
                 func.sum(
                     case(
                         (
                             DataPointSeriesArchive.series_type_definition_id == distance_id,
                             DataPointSeriesArchive.value,
                         ),
+                        else_=None,
                     )
                 ).label("distance_sum"),
                 func.sum(
                     case(
                         (
                             DataPointSeriesArchive.series_type_definition_id == flights_id,
                             DataPointSeriesArchive.value,
                         ),
+                        else_=None,
                     )
                 ).label("flights_climbed_sum"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/repositories/archival_repository.py` around lines 345 - 360, The
two SUM+CASE aggregations for distance_sum and flights_climbed_sum in
archival_repository.py are missing explicit else_ clauses; update the case(...)
expressions used inside func.sum for
DataPointSeriesArchive.series_type_definition_id == distance_id and ==
flights_id (the ones labeled "distance_sum" and "flights_climbed_sum") to
include else_=0 (or else_=None to match desired behavior) so they are consistent
with the other aggregations—locate the case() calls wrapped by func.sum(...) and
add the else_ parameter to each.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/routes/v1/archival.py`:
- Around line 21-25: Add explicit response_model and status_code kwargs to the
three `@router` decorators for the archival endpoints: the GET
"/settings/archival" route, the route in the 33-41 block, and the route in the
50-54 block; set status_code=200 for normal GET responses and status_code=202
for the "run" endpoint (the long-running/run handler referenced in the review),
and set response_model to the Pydantic model that the handler actually returns
(use the handler's return type/model defined for each route). Ensure you add
these arguments directly on the `@router.get/`@router.post decorators for those
endpoints so OpenAPI and response contracts are explicit.

In `@backend/app/models/data_point_series_archive.py`:
- Around line 25-33: The model's __table_args__ currently only defines a
composite UniqueConstraint ("data_source_id","series_type_definition_id","date",
name="uq_archive_source_type_date"); add a separate Index on the "date" column
to that __table_args__ tuple to support efficient retention/cleanup queries
(e.g., an Index("ix_data_point_series_archive_date", "date")), ensuring the
index is declared alongside the existing UniqueConstraint.

In `@backend/app/models/user_invitation_code.py`:
- Line 3: Replace the raw sqlalchemy.Index import/usages with the project's
mapping wrapper: import and use Indexed from app.mappings instead of Index, and
update any model column/type annotations that currently rely on plain Index to
use the Indexed constraint (alongside Mapped/PrimaryKey/Unique patterns) in the
UserInvitationCode model (references: the import line and the places where Index
is applied around the fields at the spots corresponding to the Index usages on
the model, e.g., the definitions around lines where Index is referenced). Ensure
the file uses Mapped typing with the app.mappings constraints (Indexed) rather
than sqlalchemy.Index so model-layer contracts are consistent.

In `@backend/app/repositories/archival_repository.py`:
- Around line 232-243: The deletion block can run even when values_list is empty
due to unknown series_type_definition_id values, risking data loss; modify the
logic around the DataPointSeries delete (references: values_list,
series_type_definition_id, source_id_list, cutoff_date, deleted_count,
total_deleted, db.commit()) to skip the delete and db.commit() when values_list
(or the computed source_id_list for the batch) is empty—either continue to next
batch or log and raise—so no live rows are removed unless there are archived
values to back them up.
- Around line 278-283: The deletion uses Query.delete() with .limit(), which
SQLAlchemy forbids; modify the batch delete in archival_repository.py to first
select a limited set of primary keys and then delete by those IDs: use
DataPointSeries.with_entities(DataPointSeries.id).filter(cast(DataPointSeries.recorded_at,
Date) < cutoff_date).limit(ARCHIVE_BATCH_SIZE) to produce a subquery or list of
ids, then call
db.query(DataPointSeries).filter(DataPointSeries.id.in_(...)).delete(synchronize_session=False)
(or fetch ids and delete by id list) so the deletion is performed safely in
batches; keep references to DataPointSeries, cutoff_date and ARCHIVE_BATCH_SIZE
when implementing the change.

In `@backend/app/services/summaries_service.py`:
- Around line 202-206: The current code in summaries_service.py returns
live_results when archival_settings_repo.get(db_session) yields
settings.archive_after_days is None, which hides previously archived data;
remove that early return and always merge archive reads with live results.
Specifically, in the method using archival_settings_repo.get and returning
live_results, stop gating archive reads on settings.archive_after_days being
set; instead call the archive read path unconditionally (or detect whether any
archived data exists) and merge its results with live_results, while keeping the
existing exception handling (catch errors from archival reads but do not return
only live_results unless the archive read actually fails). Use the existing
symbols archival_settings_repo.get, live_results, and the archive read/merge
logic in this file to implement the change.

In
`@backend/migrations/versions/2026_02_27_1248-0421f956a084_archive_data_point_series.py`:
- Around line 29-41: Add a standalone index on the archive table's date column
to speed retention deletes: in the migration where
op.create_table("data_point_series_archive", ...) is defined (the block that
declares columns id, data_source_id, series_type_definition_id, date, value,
sample_count and the UniqueConstraint "uq_archive_source_type_date"), add an
op.create_index call that creates an index on the "date" column (choose a clear
name like "ix_data_point_series_archive_date") so DELETE ... WHERE date <
cutoff_date can use the index instead of scanning the unique composite index.
- Around line 22-28: The migration creates "archival_settings" but does not
enforce the singleton row (id=1) or insert it; update the migration to (1) add a
CheckConstraint on the table creation (e.g., sa.CheckConstraint("id = 1",
name="ck_archival_settings_singleton")) to enforce singleton semantics and (2)
after op.create_table("archival_settings", ...) insert the required row with
id=1 via op.execute or op.bulk_insert (provide sensible defaults or NULLs for
archive_after_days/delete_after_days). Also update the downgrade to delete the
row where id=1 (or rely on DROP TABLE) before dropping the table so the
migration cleanly reverses.

In `@backend/scripts/init/seed_archival_settings.py`:
- Around line 11-20: The seed logic for ArchivalSetting uses a check-then-insert
(db.query(ArchivalSetting).filter(ArchivalSetting.id == 1).first() then
db.add/commit) which is racy; update the seed to handle a concurrent insert by
attempting the insert and catching the DB uniqueness error (e.g.,
sqlalchemy.exc.IntegrityError) on commit, performing db.rollback() and treating
that as “already initialized”; alternatively use an upsert/ON CONFLICT DO
NOTHING or a get_or_create-style operation for ArchivalSetting(id=1) so two
simultaneous instances won’t crash on PK conflict.

In `@backend/tests/conftest.py`:
- Line 234: Rename the snake_case variable webhook_module to camelCase
webhookModule where it's declared and referenced (ensure all uses in this file,
including in the patch call, are updated), and update the patch call for
trigger_next_pending_type to include an explicit return_value matching its
contract (e.g., return_value={} or return_value={"status": "ok"}) so the patched
function returns a dict[str, Any] like the surrounding patches.

---

Nitpick comments:
In `@backend/app/repositories/archival_repository.py`:
- Around line 345-360: The two SUM+CASE aggregations for distance_sum and
flights_climbed_sum in archival_repository.py are missing explicit else_
clauses; update the case(...) expressions used inside func.sum for
DataPointSeriesArchive.series_type_definition_id == distance_id and ==
flights_id (the ones labeled "distance_sum" and "flights_climbed_sum") to
include else_=0 (or else_=None to match desired behavior) so they are consistent
with the other aggregations—locate the case() calls wrapped by func.sum(...) and
add the else_ parameter to each.

In `@backend/app/schemas/archival.py`:
- Line 59: The growth_class field on the Pydantic model (growth_class: str =
Field(...)) is too permissive—change its type to a Literal union to enforce
allowed values (Literal['bounded', 'linear_efficient', 'linear']), import
Literal from typing (or typing_extensions if needed), and keep the existing
Field(description=...) on growth_class so validation and schema generation
reflect the constrained enum-like choices.

In `@backend/app/schemas/series_types.py`:
- Around line 434-439: Replace the silent default in get_aggregation_method with
an explicit failure: check AGGREGATION_METHOD_BY_TYPE for the provided
series_type inside get_aggregation_method (rather than using .get(...,
AggregationMethod.AVG)), and raise a clear exception (e.g., ValueError or
KeyError) that includes the SeriesType value when no mapping exists; keep the
function signature and return type but remove the fallback to
AggregationMethod.AVG so missing mappings fail fast during development.

In `@backend/app/services/archival_service.py`:
- Around line 27-31: The __init__ method in ArchivalService is missing an
explicit return type; update the constructor signature of def __init__(self,
log: Logger) to include an explicit return type annotation (-> None) so it reads
def __init__(self, log: Logger) -> None:, leaving body unchanged (self.logger,
self.settings_repo = ArchivalSettingRepository(), self.archive_repo =
DataPointSeriesArchiveRepository()); ensure any required imports for typing are
present if your linter/enforcer requires them.
- Around line 56-70: The public method run_daily_archival should be decorated
with the existing `@handle_exceptions` decorator for consistency with other
service methods and to centralize error handling; add `@handle_exceptions` above
the run_daily_archival method definition and ensure imports/reference to
handle_exceptions are present; also replace the broad return type dict with a
specific TypedDict (e.g., ArchivalSummary with keys archived_rows, deleted_rows,
deleted_live_rows) and use that type for the method signature and the summary
variable to make the return shape explicit (update any callers if needed).

In `@backend/tests/conftest.py`:
- Line 218: The variable webhook_module should be renamed to camelCase
(webhookModule) to follow project conventions; rename the identifier
webhook_module to webhookModule in backend/tests/conftest.py and update every
reference/usage in that file (fixtures, tests, or helper calls) to the new name
so references like webhook_module -> webhookModule remain consistent and tests
import/use the updated symbol.

In `@frontend/src/hooks/api/use-archival.ts`:
- Around line 21-33: The mutation in useUpdateArchivalSettings (useMutation
calling archivalService.updateSettings) lacks an onMutate optimistic update and
rollback; add onMutate that cancels queries for queryKeys.archival.all,
snapshots the previous cache state, applies the updated ArchivalSettings to the
query cache for immediate UI feedback, return the snapshot as context; modify
onError to restore the previous snapshot from context to rollback if the server
call fails; keep invalidateQueries in onSuccess or onSettled to refetch
authoritative data.

In `@frontend/src/lib/api/services/archival.service.ts`:
- Around line 49-84: Remove the duplicated try/catch wrappers around apiClient
calls in the methods getSettings, updateSettings, and triggerArchival and let
apiClient propagate its normalized ApiError; specifically, replace each async
method body so it directly returns await apiClient.get/put/post with the same
generic types and endpoints (keep method names and signatures: getSettings,
updateSettings, triggerArchival) rather than catching and re-wrapping errors,
thereby flattening the service code and relying on apiClient's centralized error
normalization.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b364f1d and 2c46c24.

📒 Files selected for processing (28)
  • backend/app/api/routes/v1/__init__.py
  • backend/app/api/routes/v1/archival.py
  • backend/app/integrations/celery/core.py
  • backend/app/integrations/celery/tasks/__init__.py
  • backend/app/integrations/celery/tasks/archival_task.py
  • backend/app/models/__init__.py
  • backend/app/models/archival_setting.py
  • backend/app/models/data_point_series.py
  • backend/app/models/data_point_series_archive.py
  • backend/app/models/refresh_token.py
  • backend/app/models/user_invitation_code.py
  • backend/app/repositories/__init__.py
  • backend/app/repositories/archival_repository.py
  • backend/app/repositories/repositories.py
  • backend/app/schemas/archival.py
  • backend/app/schemas/series_types.py
  • backend/app/services/__init__.py
  • backend/app/services/archival_service.py
  • backend/app/services/summaries_service.py
  • backend/migrations/versions/2026_02_27_1248-0421f956a084_archive_data_point_series.py
  • backend/scripts/init/seed_archival_settings.py
  • backend/scripts/start/app.sh
  • backend/tests/conftest.py
  • frontend/src/hooks/api/use-archival.ts
  • frontend/src/lib/api/services/archival.service.ts
  • frontend/src/lib/query/keys.ts
  • frontend/src/routes/_authenticated/settings.tsx
  • frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx

@railway-app railway-app bot temporarily deployed to Open Wearables / open-wearables-pr-535 February 27, 2026 16:11 Destroyed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/app/services/summaries_service.py (1)

201-204: Narrow the exception handling to expected failure cases.

Using bare except Exception could mask unexpected errors (e.g., database connectivity issues) and silently skip archive merging. Consider catching the specific exception for a missing settings row.

Proposed narrower exception handling
+from sqlalchemy.orm.exc import NoResultFound
+
 def _merge_archive_activity(
     ...
 ):
     try:
         self.archival_settings_repo.get(db_session)
-    except Exception:
+    except NoResultFound:
+        # Settings row not seeded yet — archival not available
         return live_results
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/summaries_service.py` around lines 201 - 204, The code
currently swallows all exceptions when calling
archival_settings_repo.get(db_session); replace the bare except with a narrow
catch for the specific "missing row" error (e.g.,
sqlalchemy.orm.exc.NoResultFound or your repository's NotFound/EntityNotFound)
so only the expected "no settings" case returns live_results, and let other
exceptions propagate (or re-raise) so database/connection errors are not hidden;
update the try/except around archival_settings_repo.get to import and catch that
specific exception and keep the existing return live_results behavior for that
case.
backend/app/repositories/archival_repository.py (1)

266-277: Consider adding batching to delete_archive_before for consistency.

Unlike delete_live_before (which batches with safety limits), this method performs an unbounded DELETE. While archive data is expected to be smaller (~1/500 of raw), large retention windows could still accumulate significant data. An unbounded delete may cause long table locks.

Optional: Add batching similar to delete_live_before
 def delete_archive_before(self, db: DbSession, cutoff_date: date) -> int:
-    deleted = (
-        db.query(DataPointSeriesArchive)
-        .filter(DataPointSeriesArchive.date < cutoff_date)
-        .delete(synchronize_session=False)
-    )
-    db.commit()
-    return deleted
+    """Permanently delete archive rows older than *cutoff_date*.
+
+    Uses batched deletes to avoid long-running transactions.
+    """
+    total_deleted = 0
+    while True:
+        ids_to_delete = (
+            db.query(DataPointSeriesArchive.id)
+            .filter(DataPointSeriesArchive.date < cutoff_date)
+            .limit(ARCHIVE_BATCH_SIZE)
+            .subquery()
+        )
+        deleted = (
+            db.query(DataPointSeriesArchive)
+            .filter(DataPointSeriesArchive.id.in_(ids_to_delete))
+            .delete(synchronize_session=False)
+        )
+        if deleted == 0:
+            break
+        total_deleted += deleted
+        db.commit()
+    return total_deleted
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/repositories/archival_repository.py` around lines 266 - 277, The
current delete_archive_before in function delete_archive_before deletes all
DataPointSeriesArchive rows older than cutoff_date in one unbounded operation;
change it to perform batched deletes (mirror delete_live_before behavior) by
looping delete(...) with a batch_size limit (e.g., max_rows_per_pass) using
.filter(DataPointSeriesArchive.date <
cutoff_date).delete(synchronize_session=False), committing after each batch,
accumulating a total_deleted count, and breaking the loop when a pass deletes
fewer than the batch size or when a safety max_passes/max_rows limit is reached;
return the accumulated total_deleted to preserve the original return contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/integrations/celery/core.py`:
- Around line 5-8: The import block is not isort-compliant; reorder the imports
so third-party packages come before first-party modules: import Celery, signals,
current_app as current_celery_app and crontab from celery first, then import
settings from app.config (i.e., ensure the symbols Celery, signals,
current_celery_app, crontab appear before from app.config import settings) so
isort will keep the file stable.
- Around line 89-92: The comment for the "run-daily-archival" Celery beat entry
is misleading: crontab(hour=3, minute=0) runs at 03:00 in the configured
timezone (timezone="Europe/Warsaw" set near line 50), not UTC; update the inline
comment next to the "run-daily-archival" schedule (task
"app.integrations.celery.tasks.archival_task.run_daily_archival", schedule
crontab(hour=3, minute=0)) to say "Daily at 03:00 Europe/Warsaw" (or change the
Celery timezone to UTC if you want it to run at 03:00 UTC instead).

In `@frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx`:
- Around line 195-198: The current parseInt(archiveDays) || 90 and
parseInt(deleteDays) || 365 silently mask invalid/zero inputs; change the
projection props to pass explicit numeric values or undefined/NaN instead of
falling back to defaults (e.g., compute const archiveDaysNum =
Number(archiveDays); const deleteDaysNum = Number(deleteDays); and pass those
values directly), and update the GrowthProjection component to treat non-finite
numbers and values <= 0 as “invalid / awaiting valid input” (render an invalid
state instead of showing a default projection) so the chart reflects form
validity.
- Around line 323-326: The "Run Now" Button currently calls
triggerMutation.mutate() directly (and is only disabled by
triggerMutation.isPending), which allows runs using stale persisted settings
when there are unsaved edits; implement a handleRunNow function that first
checks for unsaved changes (the form's dirty flag or the existing save function
for the retention policy), save those changes (call the existing
savePolicy/saveChanges function or submit the form) and only after that resolves
call triggerMutation.mutate(); replace onClick={() => triggerMutation.mutate()}
with onClick={handleRunNow} and also disable the button when the form is dirty
or save is in progress to prevent concurrent runs.
- Around line 43-52: The effect currently freezes form state to the first
payload by gating updates on hasInitialized; update the useEffect in this file
so it syncs whenever data.settings changes (remove or change the hasInitialized
check) and always call setArchiveEnabled, setArchiveDays, setDeleteEnabled,
setDeleteDays from that effect when data is present; also ensure the toggle and
input handlers for archive/delete call setIsDirty(true) when the user mutates
fields, and after the successful save handler (e.g., in your saveSettings or
onSaveSuccess function) reset setIsDirty(false) so the UI correctly reflects
dirty/clean state.

---

Nitpick comments:
In `@backend/app/repositories/archival_repository.py`:
- Around line 266-277: The current delete_archive_before in function
delete_archive_before deletes all DataPointSeriesArchive rows older than
cutoff_date in one unbounded operation; change it to perform batched deletes
(mirror delete_live_before behavior) by looping delete(...) with a batch_size
limit (e.g., max_rows_per_pass) using .filter(DataPointSeriesArchive.date <
cutoff_date).delete(synchronize_session=False), committing after each batch,
accumulating a total_deleted count, and breaking the loop when a pass deletes
fewer than the batch size or when a safety max_passes/max_rows limit is reached;
return the accumulated total_deleted to preserve the original return contract.

In `@backend/app/services/summaries_service.py`:
- Around line 201-204: The code currently swallows all exceptions when calling
archival_settings_repo.get(db_session); replace the bare except with a narrow
catch for the specific "missing row" error (e.g.,
sqlalchemy.orm.exc.NoResultFound or your repository's NotFound/EntityNotFound)
so only the expected "no settings" case returns live_results, and let other
exceptions propagate (or re-raise) so database/connection errors are not hidden;
update the try/except around archival_settings_repo.get to import and catch that
specific exception and keep the existing return live_results behavior for that
case.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c46c24 and 882b1c2.

📒 Files selected for processing (12)
  • backend/app/api/routes/v1/archival.py
  • backend/app/integrations/celery/core.py
  • backend/app/models/archival_setting.py
  • backend/app/models/data_point_series_archive.py
  • backend/app/repositories/archival_repository.py
  • backend/app/schemas/archival.py
  • backend/app/services/summaries_service.py
  • backend/migrations/versions/2026_02_27_1710-8f618017e710_archive_data_point_series.py
  • backend/scripts/init/seed_archival_settings.py
  • backend/tests/conftest.py
  • frontend/src/lib/api/services/archival.service.ts
  • frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app/models/data_point_series_archive.py
  • frontend/src/lib/api/services/archival.service.ts
  • backend/app/api/routes/v1/archival.py

@railway-app railway-app bot temporarily deployed to Open Wearables / open-wearables-pr-535 March 3, 2026 21:29 Destroyed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py (1)

30-50: ⚠️ Potential issue | 🟠 Major

Add a standalone bucket_start_at index for archive retention queries.

At Line 30-49, the archive table has only PK + composite unique index. Retention and range reads are time-based, so this will degrade to scans as archive grows.

Proposed migration patch
     op.create_table(
         "data_point_series_archive",
         sa.Column("id", sa.UUID(), nullable=False),
         sa.Column("data_source_id", sa.UUID(), nullable=False),
         sa.Column("series_type_definition_id", sa.Integer(), nullable=False),
         sa.Column("bucket_start_at", sa.DateTime(timezone=True), nullable=False),
         sa.Column("aggregation_type", sa.String(length=32), nullable=False),
         sa.Column("value", sa.Numeric(precision=10, scale=3), nullable=False),
         sa.Column("sample_count", sa.Integer(), nullable=False),
         sa.ForeignKeyConstraint(["data_source_id"], ["data_source.id"], ondelete="CASCADE"),
         sa.ForeignKeyConstraint(["series_type_definition_id"], ["series_type_definition.id"], ondelete="RESTRICT"),
         sa.PrimaryKeyConstraint("id"),
         sa.UniqueConstraint(
             "data_source_id",
             "series_type_definition_id",
             "bucket_start_at",
             "aggregation_type",
             name="uq_archive_source_type_start_agg",
         ),
     )
+    op.create_index(
+        op.f("ix_data_point_series_archive_bucket_start_at"),
+        "data_point_series_archive",
+        ["bucket_start_at"],
+        unique=False,
+    )
@@
 def downgrade() -> None:
     # ### commands auto generated by Alembic - please adjust! ###
     op.create_index(
         op.f("idx_data_point_series_source_type_time"),
         "data_point_series",
         ["data_source_id", "series_type_definition_id", "recorded_at"],
         unique=False,
     )
+    op.drop_index(op.f("ix_data_point_series_archive_bucket_start_at"), table_name="data_point_series_archive")
     op.drop_table("data_point_series_archive")
     op.drop_table("archival_settings")

Based on learnings, for new tables and indexes in backend/app/models, use SQLAlchemy’s default ix_ prefix.

Also applies to: 56-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py` around
lines 30 - 50, Add a standalone time index on bucket_start_at for the archive
table to avoid full scans: create an index on
"data_point_series_archive"."bucket_start_at" using SQLAlchemy's default ix_
prefix (e.g. ix_data_point_series_archive_bucket_start_at) in the migration
where op.create_table(...) defines data_point_series_archive; also apply the
same pattern for the other archive-related table/section referenced around lines
56-63 so both migrations add standalone ix_... indexes for bucket_start_at
instead of relying solely on the composite unique constraint.
🧹 Nitpick comments (1)
backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py (1)

1-6: Use the migration title convention in the Alembic header.

At Line 1, """archive does not follow the project naming pattern (Add <feature> <object>), which makes migration history harder to scan.

As per coding guidelines, backend/migrations/versions/*.py: use Alembic migrations with naming convention 'Add <feature> <object>'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py` around
lines 1 - 6, The migration header docstring currently starts with """archive
which doesn't follow our Alembic title convention; update the top-level
docstring in the migration file with Revision ID cd18aa553afd to use the pattern
"Add <feature> <object>" (e.g., "Add archive support for X" or "Add archive
table"), keeping the existing Revision ID and Reviser lines intact so the
migration metadata isn't changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/repositories/archival_repository.py`:
- Around line 163-167: The deletion step is too broad and removes rows of
unknown series types when mixed with known types; modify the archival flow so
deletes only target rows that were actually archived by adding a series-type
constraint (e.g., DataPointSeries.series_type.in_(processed_types) or
equivalent) to the queries used to build source_ids and to the final delete
call, or else collect the exact primary keys archived and delete by those ids;
update the logic around the parts that skip unknown types (the branch that skips
unknown types at the block referencing unknown types) and the delete block (the
scope currently using source+date) to include the processed/known series types
or archived ids so unknown types are never deleted.
- Around line 165-166: The predicate uses cast(DataPointSeries.recorded_at,
Date) which prevents index use; replace those casts with a timestamp cutoff
comparison so the DB can use indexes. Specifically, change
.filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date) to compare the
raw column against a datetime timestamp (e.g.
.filter(DataPointSeries.recorded_at < cutoff_timestamp)), where cutoff_timestamp
is a datetime (timezone-aware as needed) representing the cutoff instant (e.g.
midnight of cutoff_date or end/start boundary you intend). Apply the same
replacement for every occurrence of cast(DataPointSeries.recorded_at, Date) in
this module so all queries use DataPointSeries.recorded_at < cutoff_timestamp
instead.

In `@frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx`:
- Around line 527-587: The early conditional return for invalid config
(archiveInvalid || deleteInvalid) must be moved after all hooks to avoid
breaking hook ordering; keep the validation logic but remove the early return at
the top and instead compute all hooks and derived values (archivalEffective,
growthClass useMemo, cfg, liveTotalBytes, spanDays, dailyRawEstimate useMemo,
chartData useMemo) first, then render the invalid-configuration JSX when
archiveInvalid || deleteInvalid (e.g., by returning that JSX after the hooks or
setting a variable like invalidView and returning it at the end) so hooks
(useMemo calls in growthClass, dailyRawEstimate, chartData) always run in the
same order across renders.

---

Duplicate comments:
In `@backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py`:
- Around line 30-50: Add a standalone time index on bucket_start_at for the
archive table to avoid full scans: create an index on
"data_point_series_archive"."bucket_start_at" using SQLAlchemy's default ix_
prefix (e.g. ix_data_point_series_archive_bucket_start_at) in the migration
where op.create_table(...) defines data_point_series_archive; also apply the
same pattern for the other archive-related table/section referenced around lines
56-63 so both migrations add standalone ix_... indexes for bucket_start_at
instead of relying solely on the composite unique constraint.

---

Nitpick comments:
In `@backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py`:
- Around line 1-6: The migration header docstring currently starts with
"""archive which doesn't follow our Alembic title convention; update the
top-level docstring in the migration file with Revision ID cd18aa553afd to use
the pattern "Add <feature> <object>" (e.g., "Add archive support for X" or "Add
archive table"), keeping the existing Revision ID and Reviser lines intact so
the migration metadata isn't changed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 882b1c2 and 660d030.

📒 Files selected for processing (6)
  • backend/app/database.py
  • backend/app/integrations/celery/core.py
  • backend/app/models/data_point_series_archive.py
  • backend/app/repositories/archival_repository.py
  • backend/migrations/versions/2026_03_03_2227-cd18aa553afd_archive.py
  • frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/models/data_point_series_archive.py

Comment on lines +14 to +20
from app.schemas.series_types import (
AGGREGATION_METHOD_BY_TYPE,
AggregationMethod,
SeriesType,
get_series_type_from_id,
get_series_type_id,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Repository layer is coupled to app.schemas; move aggregation policy mapping out of repository.

At Line 14-20, the repository depends on schema-layer enums/mappings. This breaks the repository boundary and makes DB code depend on API/schema concerns.

As per coding guidelines, backend/app/repositories/**/*.py: repository layer must handle only database operations using SQLAlchemy models as input/output (never Pydantic schemas).

Comment on lines +163 to +167
source_ids = (
db.query(DataPointSeries.data_source_id)
.filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date)
.distinct()
.limit(10)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unknown series types can still be deleted without archival in mixed batches.

At Line 229-232 unknown types are skipped, but at Line 253-259 delete is scoped only by source+date.
If a batch has both known and unknown types, unknown rows are deleted even though they were never archived.

Proposed safety fix
     def archive_data_before(self, db: DbSession, cutoff_date: date) -> int:
@@
-        while True:
+        known_series_type_ids = [get_series_type_id(series_type) for series_type in SeriesType]
+
+        while True:
@@
             source_ids = (
                 db.query(DataPointSeries.data_source_id)
-                .filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date)
+                .filter(
+                    cast(DataPointSeries.recorded_at, Date) < cutoff_date,
+                    DataPointSeries.series_type_definition_id.in_(known_series_type_ids),
+                )
                 .distinct()
                 .limit(10)
                 .all()
             )
@@
                 .filter(
                     DataPointSeries.data_source_id.in_(source_id_list),
                     cast(DataPointSeries.recorded_at, Date) < cutoff_date,
+                    DataPointSeries.series_type_definition_id.in_(known_series_type_ids),
                 )
@@
             deleted_count = (
                 db.query(DataPointSeries)
                 .filter(
                     DataPointSeries.data_source_id.in_(source_id_list),
                     cast(DataPointSeries.recorded_at, Date) < cutoff_date,
+                    DataPointSeries.series_type_definition_id.in_(known_series_type_ids),
                 )
                 .delete(synchronize_session=False)
             )

Also applies to: 229-237, 253-259

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/repositories/archival_repository.py` around lines 163 - 167, The
deletion step is too broad and removes rows of unknown series types when mixed
with known types; modify the archival flow so deletes only target rows that were
actually archived by adding a series-type constraint (e.g.,
DataPointSeries.series_type.in_(processed_types) or equivalent) to the queries
used to build source_ids and to the final delete call, or else collect the exact
primary keys archived and delete by those ids; update the logic around the parts
that skip unknown types (the branch that skips unknown types at the block
referencing unknown types) and the delete block (the scope currently using
source+date) to include the processed/known series types or archived ids so
unknown types are never deleted.

Comment on lines +165 to +166
.filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date)
.distinct()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid cast(recorded_at, Date) in predicates; use a timestamp cutoff to keep filters index-friendly.

At Line 165/191/257/307, casting the column side prevents efficient index use on recorded_at and will hurt at scale.

Proposed predicate change
     def archive_data_before(self, db: DbSession, cutoff_date: date) -> int:
@@
-        total_deleted = 0
+        total_deleted = 0
+        cutoff_ts = datetime.combine(cutoff_date, datetime.min.time(), tzinfo=timezone.utc)
@@
-                .filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date)
+                .filter(DataPointSeries.recorded_at < cutoff_ts)
@@
                     DataPointSeries.data_source_id.in_(source_id_list),
-                    cast(DataPointSeries.recorded_at, Date) < cutoff_date,
+                    DataPointSeries.recorded_at < cutoff_ts,
@@
                     DataPointSeries.data_source_id.in_(source_id_list),
-                    cast(DataPointSeries.recorded_at, Date) < cutoff_date,
+                    DataPointSeries.recorded_at < cutoff_ts,
                 )

     def delete_live_before(self, db: DbSession, cutoff_date: date) -> int:
@@
-        total_deleted = 0
+        total_deleted = 0
+        cutoff_ts = datetime.combine(cutoff_date, datetime.min.time(), tzinfo=timezone.utc)
@@
-                .filter(cast(DataPointSeries.recorded_at, Date) < cutoff_date)
+                .filter(DataPointSeries.recorded_at < cutoff_ts)

Also applies to: 191-192, 257-258, 307-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/repositories/archival_repository.py` around lines 165 - 166, The
predicate uses cast(DataPointSeries.recorded_at, Date) which prevents index use;
replace those casts with a timestamp cutoff comparison so the DB can use
indexes. Specifically, change .filter(cast(DataPointSeries.recorded_at, Date) <
cutoff_date) to compare the raw column against a datetime timestamp (e.g.
.filter(DataPointSeries.recorded_at < cutoff_timestamp)), where cutoff_timestamp
is a datetime (timezone-aware as needed) representing the cutoff instant (e.g.
midnight of cutoff_date or end/start boundary you intend). Apply the same
replacement for every occurrence of cast(DataPointSeries.recorded_at, Date) in
this module so all queries use DataPointSeries.recorded_at < cutoff_timestamp
instead.

Comment on lines +527 to +587
if (archiveInvalid || deleteInvalid) {
return (
<div className="bg-zinc-900/50 border border-zinc-800 rounded-xl overflow-hidden min-h-[200px] flex items-center justify-center">
<div className="text-center text-zinc-500">
<AlertTriangle className="h-8 w-8 mx-auto mb-2 opacity-50" />
<p>Invalid configuration</p>
<p className="text-xs mt-1">
Please enter valid duration in days ({archiveInvalid && 'Archival'}
{archiveInvalid && deleteInvalid && ' & '}
{deleteInvalid && 'Retention'})
</p>
</div>
</div>
);
}

const archivalEffective =
archiveEnabled &&
archiveDays > 0 &&
(!deleteEnabled || deleteDays > archiveDays);

const growthClass: GrowthClass = useMemo(() => {
if (deleteEnabled) return 'bounded';
if (archiveEnabled) return 'linear_efficient';
return 'linear';
}, [archiveEnabled, deleteEnabled]);

const cfg = GROWTH_CONFIG[growthClass];

const liveTotalBytes = storage.live_data_bytes + storage.live_index_bytes;
const spanDays = Math.max(storage.live_data_span_days, 1);

const dailyRawEstimate = useMemo(() => {
if (storage.live_row_count === 0) return 0;
return liveTotalBytes / spanDays;
}, [storage.live_row_count, liveTotalBytes, spanDays]);

const chartData = useMemo(
() =>
computeProjection(
archiveEnabled,
archiveDays,
deleteEnabled,
deleteDays,
liveTotalBytes,
storage.live_row_count,
storage.archive_data_bytes + storage.archive_index_bytes,
storage.live_data_span_days
),
[
archiveEnabled,
archiveDays,
deleteEnabled,
deleteDays,
liveTotalBytes,
storage.live_row_count,
storage.archive_data_bytes,
storage.archive_index_bytes,
storage.live_data_span_days,
]
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx | head -600 | tail -100

Repository: the-momentum/open-wearables

Length of output: 3991


🏁 Script executed:

wc -l frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx

Repository: the-momentum/open-wearables

Length of output: 141


🏁 Script executed:

cat -n frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx | sed -n '500,600p'

Repository: the-momentum/open-wearables

Length of output: 3999


Move validation check after all hooks to fix React hook ordering violation.

The early return on line 527 breaks React's hook ordering rules by conditionally skipping the useMemo calls on lines 548, 559, and 564. Move the validation check after all hooks are called to ensure consistent execution order across renders.

Proposed fix
-  const archiveInvalid =
-    archiveEnabled && (!Number.isFinite(archiveDays) || archiveDays <= 0);
-  const deleteInvalid =
-    deleteEnabled && (!Number.isFinite(deleteDays) || deleteDays <= 0);
-
-  if (archiveInvalid || deleteInvalid) {
-    return (
-      <div className="bg-zinc-900/50 border border-zinc-800 rounded-xl overflow-hidden min-h-[200px] flex items-center justify-center">
-        <div className="text-center text-zinc-500">
-          <AlertTriangle className="h-8 w-8 mx-auto mb-2 opacity-50" />
-          <p>Invalid configuration</p>
-          <p className="text-xs mt-1">
-            Please enter valid duration in days ({archiveInvalid && 'Archival'}
-            {archiveInvalid && deleteInvalid && ' & '}
-            {deleteInvalid && 'Retention'})
-          </p>
-        </div>
-      </div>
-    );
-  }
-
   const archivalEffective =
     archiveEnabled &&
     archiveDays > 0 &&
     (!deleteEnabled || deleteDays > archiveDays);
@@
   const chartData = useMemo(
@@
   );
+
+  const archiveInvalid =
+    archiveEnabled && (!Number.isFinite(archiveDays) || archiveDays <= 0);
+  const deleteInvalid =
+    deleteEnabled && (!Number.isFinite(deleteDays) || deleteDays <= 0);
+
+  if (archiveInvalid || deleteInvalid) {
+    return (
+      <div className="bg-zinc-900/50 border border-zinc-800 rounded-xl overflow-hidden min-h-[200px] flex items-center justify-center">
+        <div className="text-center text-zinc-500">
+          <AlertTriangle className="h-8 w-8 mx-auto mb-2 opacity-50" />
+          <p>Invalid configuration</p>
+          <p className="text-xs mt-1">
+            Please enter valid duration in days ({archiveInvalid && 'Archival'}
+            {archiveInvalid && deleteInvalid && ' & '}
+            {deleteInvalid && 'Retention'})
+          </p>
+        </div>
+      </div>
+    );
+  }
🧰 Tools
🪛 Biome (2.4.4)

[error] 548-548: This hook is being called conditionally, but all hooks must be called in the exact same order in every component render.

(lint/correctness/useHookAtTopLevel)


[error] 559-559: This hook is being called conditionally, but all hooks must be called in the exact same order in every component render.

(lint/correctness/useHookAtTopLevel)


[error] 564-564: This hook is being called conditionally, but all hooks must be called in the exact same order in every component render.

(lint/correctness/useHookAtTopLevel)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/routes/_authenticated/settings/data-lifecycle-tab.tsx` around
lines 527 - 587, The early conditional return for invalid config (archiveInvalid
|| deleteInvalid) must be moved after all hooks to avoid breaking hook ordering;
keep the validation logic but remove the early return at the top and instead
compute all hooks and derived values (archivalEffective, growthClass useMemo,
cfg, liveTotalBytes, spanDays, dailyRawEstimate useMemo, chartData useMemo)
first, then render the invalid-configuration JSX when archiveInvalid ||
deleteInvalid (e.g., by returning that JSX after the hooks or setting a variable
like invalidView and returning it at the end) so hooks (useMemo calls in
growthClass, dailyRawEstimate, chartData) always run in the same order across
renders.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/schemas/series_types.py`:
- Around line 343-437: AGGREGATION_METHOD_BY_TYPE is missing
SeriesType.body_fat_mass, SeriesType.skeletal_muscle_mass, and
SeriesType.hydration (causing silent AVG fallback) — add these series types to
AGGREGATION_METHOD_BY_TYPE with the correct AggregationMethod (e.g., SUM for
cumulative metrics like hydration) and remove any code paths that use a
.get(..., AggregationMethod.AVG) fallback; instead, ensure lookups use direct
mapping access (AGGREGATION_METHOD_BY_TYPE[series_type]) or explicitly raise a
clear error (KeyError/ValueError) so unmapped SeriesType values fail fast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ea8d091-20eb-4e7f-8a0d-09dd27befe47

📥 Commits

Reviewing files that changed from the base of the PR and between 660d030 and b896744.

📒 Files selected for processing (4)
  • backend/app/integrations/celery/core.py
  • backend/app/integrations/celery/tasks/__init__.py
  • backend/app/schemas/series_types.py
  • backend/tests/conftest.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/tests/conftest.py
  • backend/app/integrations/celery/core.py

Comment on lines +343 to +437
AGGREGATION_METHOD_BY_TYPE: dict[SeriesType, AggregationMethod] = {
# ── Heart & Cardiovascular ──
SeriesType.heart_rate: AggregationMethod.AVG,
SeriesType.resting_heart_rate: AggregationMethod.AVG,
SeriesType.heart_rate_variability_sdnn: AggregationMethod.AVG,
SeriesType.heart_rate_recovery_one_minute: AggregationMethod.AVG,
SeriesType.walking_heart_rate_average: AggregationMethod.AVG,
SeriesType.recovery_score: AggregationMethod.AVG,
SeriesType.heart_rate_variability_rmssd: AggregationMethod.AVG,
# ── Blood & Respiratory ──
SeriesType.oxygen_saturation: AggregationMethod.AVG,
SeriesType.blood_glucose: AggregationMethod.AVG,
SeriesType.blood_pressure_systolic: AggregationMethod.AVG,
SeriesType.blood_pressure_diastolic: AggregationMethod.AVG,
SeriesType.respiratory_rate: AggregationMethod.AVG,
SeriesType.sleeping_breathing_disturbances: AggregationMethod.SUM,
SeriesType.blood_alcohol_content: AggregationMethod.AVG,
SeriesType.peripheral_perfusion_index: AggregationMethod.AVG,
SeriesType.forced_vital_capacity: AggregationMethod.AVG,
SeriesType.forced_expiratory_volume_1: AggregationMethod.AVG,
SeriesType.peak_expiratory_flow_rate: AggregationMethod.AVG,
# ── Body Composition ──
SeriesType.height: AggregationMethod.AVG,
SeriesType.weight: AggregationMethod.AVG,
SeriesType.body_fat_percentage: AggregationMethod.AVG,
SeriesType.body_mass_index: AggregationMethod.AVG,
SeriesType.lean_body_mass: AggregationMethod.AVG,
SeriesType.body_temperature: AggregationMethod.AVG,
SeriesType.skin_temperature: AggregationMethod.AVG,
SeriesType.waist_circumference: AggregationMethod.AVG,
# ── Fitness Metrics ──
SeriesType.vo2_max: AggregationMethod.AVG,
SeriesType.six_minute_walk_test_distance: AggregationMethod.MAX,
# ── Activity — Basic ──
SeriesType.steps: AggregationMethod.SUM,
SeriesType.energy: AggregationMethod.SUM,
SeriesType.basal_energy: AggregationMethod.SUM,
SeriesType.stand_time: AggregationMethod.SUM,
SeriesType.exercise_time: AggregationMethod.SUM,
SeriesType.physical_effort: AggregationMethod.AVG,
SeriesType.flights_climbed: AggregationMethod.SUM,
SeriesType.average_met: AggregationMethod.AVG,
# ── Activity — Distance ──
SeriesType.distance_walking_running: AggregationMethod.SUM,
SeriesType.distance_cycling: AggregationMethod.SUM,
SeriesType.distance_swimming: AggregationMethod.SUM,
SeriesType.distance_downhill_snow_sports: AggregationMethod.SUM,
SeriesType.distance_other: AggregationMethod.SUM,
# ── Activity — Walking Metrics ──
SeriesType.walking_step_length: AggregationMethod.AVG,
SeriesType.walking_speed: AggregationMethod.AVG,
SeriesType.walking_double_support_percentage: AggregationMethod.AVG,
SeriesType.walking_asymmetry_percentage: AggregationMethod.AVG,
SeriesType.walking_steadiness: AggregationMethod.AVG,
SeriesType.stair_descent_speed: AggregationMethod.AVG,
SeriesType.stair_ascent_speed: AggregationMethod.AVG,
# ── Activity — Running Metrics ──
SeriesType.running_power: AggregationMethod.AVG,
SeriesType.running_speed: AggregationMethod.AVG,
SeriesType.running_vertical_oscillation: AggregationMethod.AVG,
SeriesType.running_ground_contact_time: AggregationMethod.AVG,
SeriesType.running_stride_length: AggregationMethod.AVG,
# ── Activity — Swimming Metrics ──
SeriesType.swimming_stroke_count: AggregationMethod.SUM,
SeriesType.underwater_depth: AggregationMethod.AVG,
# ── Activity — Generic ──
SeriesType.cadence: AggregationMethod.AVG,
SeriesType.power: AggregationMethod.AVG,
SeriesType.speed: AggregationMethod.AVG,
SeriesType.workout_effort_score: AggregationMethod.AVG,
SeriesType.estimated_workout_effort_score: AggregationMethod.AVG,
# ── Environmental ──
SeriesType.environmental_audio_exposure: AggregationMethod.AVG,
SeriesType.headphone_audio_exposure: AggregationMethod.AVG,
SeriesType.environmental_sound_reduction: AggregationMethod.AVG,
SeriesType.time_in_daylight: AggregationMethod.SUM,
SeriesType.water_temperature: AggregationMethod.AVG,
SeriesType.uv_exposure: AggregationMethod.SUM,
SeriesType.inhaler_usage: AggregationMethod.SUM,
SeriesType.weather_temperature: AggregationMethod.AVG,
SeriesType.weather_humidity: AggregationMethod.AVG,
# ── Garmin-specific ──
SeriesType.garmin_stress_level: AggregationMethod.AVG,
SeriesType.garmin_skin_temperature: AggregationMethod.AVG,
SeriesType.garmin_fitness_age: AggregationMethod.AVG,
SeriesType.garmin_body_battery: AggregationMethod.AVG,
# ── Other ──
SeriesType.electrodermal_activity: AggregationMethod.AVG,
SeriesType.push_count: AggregationMethod.SUM,
SeriesType.atrial_fibrillation_burden: AggregationMethod.SUM,
SeriesType.insulin_delivery: AggregationMethod.SUM,
SeriesType.number_of_times_fallen: AggregationMethod.SUM,
SeriesType.number_of_alcoholic_beverages: AggregationMethod.SUM,
SeriesType.nike_fuel: AggregationMethod.SUM,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on unmapped series types instead of silently defaulting to AVG.

AGGREGATION_METHOD_BY_TYPE currently omits SeriesType.body_fat_mass, SeriesType.skeletal_muscle_mass, and SeriesType.hydration. With the current AVG fallback, omissions are silent and can produce incorrect canonical archive values (especially for cumulative metrics like hydration).

💡 Proposed fix
 AGGREGATION_METHOD_BY_TYPE: dict[SeriesType, AggregationMethod] = {
@@
     SeriesType.body_temperature: AggregationMethod.AVG,
     SeriesType.skin_temperature: AggregationMethod.AVG,
     SeriesType.waist_circumference: AggregationMethod.AVG,
+    SeriesType.body_fat_mass: AggregationMethod.AVG,
+    SeriesType.skeletal_muscle_mass: AggregationMethod.AVG,
@@
     SeriesType.number_of_alcoholic_beverages: AggregationMethod.SUM,
     SeriesType.nike_fuel: AggregationMethod.SUM,
+    SeriesType.hydration: AggregationMethod.SUM,
 }
 
+_UNMAPPED_SERIES_TYPES: set[SeriesType] = set(SeriesType) - set(AGGREGATION_METHOD_BY_TYPE)
+if _UNMAPPED_SERIES_TYPES:
+    raise RuntimeError(
+        "Missing aggregation mapping for series types: "
+        + ", ".join(sorted(series_type.value for series_type in _UNMAPPED_SERIES_TYPES))
+    )
+
 
 def get_aggregation_method(series_type: SeriesType) -> AggregationMethod:
-    """Get the aggregation method for a series type.
-
-    Falls back to AVG for unknown types (safe default for rate-like metrics).
-    """
-    return AGGREGATION_METHOD_BY_TYPE.get(series_type, AggregationMethod.AVG)
+    """Get the aggregation method for a series type."""
+    return AGGREGATION_METHOD_BY_TYPE[series_type]

Also applies to: 440-445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/schemas/series_types.py` around lines 343 - 437,
AGGREGATION_METHOD_BY_TYPE is missing SeriesType.body_fat_mass,
SeriesType.skeletal_muscle_mass, and SeriesType.hydration (causing silent AVG
fallback) — add these series types to AGGREGATION_METHOD_BY_TYPE with the
correct AggregationMethod (e.g., SUM for cumulative metrics like hydration) and
remove any code paths that use a .get(..., AggregationMethod.AVG) fallback;
instead, ensure lookups use direct mapping access
(AGGREGATION_METHOD_BY_TYPE[series_type]) or explicitly raise a clear error
(KeyError/ValueError) so unmapped SeriesType values fail fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend data model Tickets related to data model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data aggregation level & Open Wearables usage at scale

2 participants