Conversation
| ) | ||
| from app.schemas.apple.apple_xml.stats import XMLParseStats | ||
| from app.schemas.apple.healthkit.sync_request import SourceInfo | ||
| from app.schemas.apple.healthkit.sync_request import ( |
There was a problem hiding this comment.
would be changed in the schema refactor pr (#560)
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughXML parsing now extracts Apple Health sleep records and returns a third SyncRequest; upload tasks unpack this and call handle_sleep_data(db, sync_request, user_id) when present; sleep counters and tests updated; handle_sleep_data schedules finalize_stale_sleeps. Changes
Sequence DiagramsequenceDiagram
autonumber
participant XMLService as XMLService (parser)
participant Task as Upload Task
participant DB as Database
participant SleepSvc as Sleep Service
participant Celery as Celery Worker
XMLService->>XMLService: parse_xml(file) -> (time_series, workouts, sync_request)
XMLService->>Task: yield (time_series, workouts, sync_request)
Task->>DB: persist time_series & workouts
alt sync_request present
Task->>SleepSvc: handle_sleep_data(db, sync_request, user_id)
SleepSvc->>DB: save/update sleep state
SleepSvc->>Celery: finalize_stale_sleeps.delay()
Celery->>DB: cleanup stale sleep state
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
🚅 Deployed to the open-wearables-pr-583 environment in Open Wearables
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/apple/apple_xml/xml_service.py (1)
393-393:⚠️ Potential issue | 🟡 MinorInconsistent chunk size calculation.
Line 345 includes
len(sleep_records)in the chunk size check, but line 393 does not. For consistency, the workout branch should also account for sleep records:Proposed fix
- if len(workouts) + len(time_series_records) >= self.chunk_size: + if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` at line 393, The chunk-size check for the workout branch is missing sleep_records, causing inconsistent batching; update the condition that currently reads "if len(workouts) + len(time_series_records) >= self.chunk_size" to include sleep_records so it becomes "if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size", ensuring the same chunk logic used elsewhere (referencing variables workouts, time_series_records, sleep_records and attribute self.chunk_size) is applied; make the analogous change in the function/method in xml_service.py where workouts are processed.
🧹 Nitpick comments (1)
backend/app/services/apple/apple_xml/xml_service.py (1)
298-310: Consider using timezone-aware timestamp.Line 304 uses
datetime.now()which returns a naive (timezone-unaware) datetime. For consistency with other timestamp handling in the codebase (which typically uses UTC), consider usingdatetime.now(timezone.utc).Proposed fix
+from datetime import timezone + def _wrap_sleep_data(self, sleep_records: list[SleepRecord]) -> SyncRequest: """Wrap sleep data in a SyncRequest to be sent to the handle_sleep_data function.""" return SyncRequest( provider="apple_health_xml", sdkVersion="n/a", - syncTimestamp=datetime.now(), + syncTimestamp=datetime.now(timezone.utc), data=SyncRequestData( sleep=sleep_records, records=[], workouts=[], ), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 298 - 310, The syncTimestamp in _wrap_sleep_data is currently naive because it uses datetime.now(); update the code in the _wrap_sleep_data function to produce a timezone-aware UTC timestamp (e.g., use datetime.now(timezone.utc) or an equivalent UTC-aware constructor) when constructing the SyncRequest so SyncRequest.syncTimestamp is timezone-aware and consistent with other timestamp handling in the codebase; ensure you import timezone from datetime if not already imported and adjust any type expectations if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/integrations/celery/tasks/process_aws_upload_task.py`:
- Around line 114-115: The current truthy check uses the SyncRequest object
itself, but _wrap_sleep_data always returns a SyncRequest even when empty;
change the guard in process_aws_upload_task.py to check the SyncRequest's sleep
data explicitly (e.g., if sync_request.sleep_records or
len(sync_request.sleep_records) > 0) before calling handle_sleep_data(db,
sync_request, user_id) so finalize_stale_sleeps.delay() is only scheduled when
actual sleep records exist; alternatively use getattr(sync_request,
"sleep_records", None) to avoid attribute errors if needed.
In `@backend/app/integrations/celery/tasks/process_xml_upload_task.py`:
- Around line 126-128: The truthy check on sync_request in
process_xml_upload_task.py is incorrect because _wrap_sleep_data always returns
a SyncRequest object; change the condition to verify that the SyncRequest
actually contains sleep data before calling handle_sleep_data. Locate where
_wrap_sleep_data is called (producing sync_request) and replace the if
sync_request: guard with a check such as sync_request.sleep_data (or the
appropriate container/attribute name on the SyncRequest) or a length check
(e.g., if sync_request.sleep_data and len(sync_request.sleep_data) > 0) so
handle_sleep_data(db, sync_request, user_id) is only invoked when there is
non-empty sleep data.
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 401-404: The workout-branch yields a sync_request created via
self._wrap_sleep_data(sleep_records) but fails to clear sleep_records afterward,
causing duplication; update the workout-triggered yield block (the code that
does "sync_request = self._wrap_sleep_data(sleep_records)" and "yield
time_series_records, workouts, sync_request") to set sleep_records = []
immediately after the yield (matching the Record-branch behavior), ensuring
sleep_records is cleared before subsequent iterations while leaving
time_series_records and workouts resets unchanged.
- Around line 152-155: The sleep-record parsing uses document.get("source", "")
and directly calls datetime.fromisoformat on startDate/endDate; change the
attribute access to document.get("device", "") so _extract_device_info receives
the same device string as _create_record/_create_workout and as defined in
RECORD_COLUMNS, and replace the direct datetime.fromisoformat calls with the
same explicit date validation flow used in _create_record by invoking
_parse_date_fields (and handling its specific errors) before parsing to ensure
missing or malformed startDate/endDate are logged with the same precise
date-format error handling.
---
Outside diff comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Line 393: The chunk-size check for the workout branch is missing
sleep_records, causing inconsistent batching; update the condition that
currently reads "if len(workouts) + len(time_series_records) >= self.chunk_size"
to include sleep_records so it becomes "if len(workouts) +
len(time_series_records) + len(sleep_records) >= self.chunk_size", ensuring the
same chunk logic used elsewhere (referencing variables workouts,
time_series_records, sleep_records and attribute self.chunk_size) is applied;
make the analogous change in the function/method in xml_service.py where
workouts are processed.
---
Nitpick comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 298-310: The syncTimestamp in _wrap_sleep_data is currently naive
because it uses datetime.now(); update the code in the _wrap_sleep_data function
to produce a timezone-aware UTC timestamp (e.g., use datetime.now(timezone.utc)
or an equivalent UTC-aware constructor) when constructing the SyncRequest so
SyncRequest.syncTimestamp is timezone-aware and consistent with other timestamp
handling in the codebase; ensure you import timezone from datetime if not
already imported and adjust any type expectations if necessary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cb8c0420-176d-46bd-ad79-703fe0e7a547
📒 Files selected for processing (6)
backend/app/integrations/celery/tasks/process_aws_upload_task.pybackend/app/integrations/celery/tasks/process_sdk_upload_task.pybackend/app/integrations/celery/tasks/process_xml_upload_task.pybackend/app/schemas/apple/apple_xml/stats.pybackend/app/services/apple/apple_xml/xml_service.pybackend/app/services/apple/healthkit/sleep_service.py
💤 Files with no reviewable changes (1)
- backend/app/integrations/celery/tasks/process_sdk_upload_task.py
backend/app/integrations/celery/tasks/process_aws_upload_task.py
Outdated
Show resolved
Hide resolved
backend/app/integrations/celery/tasks/process_xml_upload_task.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/tests/tasks/test_process_upload_task.py (1)
233-361: Add a test case for sleep data handling in_import_xml_data.All four test methods pass
Nonefor the third tuple element (sync_request), covering only the scenario where no sleep data is present. Since the production code inprocess_aws_upload_task.py(lines 114–115) callshandle_sleep_data(db, sync_request, user_id)whensync_requestis non-None, consider adding a test that verifies the function's behavior when sleep data is included in the XML response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/tasks/test_process_upload_task.py` around lines 233 - 361, Add a new test in TestImportXmlData that covers the non-None sync_request path: patch process_aws_upload_task.handle_sleep_data (alongside XMLService, event_record_service, timeseries_service), have the mocked XMLService.parse_xml return a tuple whose third element is a non-None sync_request object, call _import_xml_data(db, xml_path, str(user.id)), and assert handle_sleep_data was called once with (db, that sync_request, str(user.id)); reference _import_xml_data and handle_sleep_data to locate where to add the test and which mocks to attach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/tests/tasks/test_process_upload_task.py`:
- Around line 233-361: Add a new test in TestImportXmlData that covers the
non-None sync_request path: patch process_aws_upload_task.handle_sleep_data
(alongside XMLService, event_record_service, timeseries_service), have the
mocked XMLService.parse_xml return a tuple whose third element is a non-None
sync_request object, call _import_xml_data(db, xml_path, str(user.id)), and
assert handle_sleep_data was called once with (db, that sync_request,
str(user.id)); reference _import_xml_data and handle_sleep_data to locate where
to add the test and which mocks to attach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3473fbff-d5ac-4209-b007-833ba7692194
📒 Files selected for processing (2)
backend/tests/conftest.pybackend/tests/tasks/test_process_upload_task.py
💤 Files with no reviewable changes (1)
- backend/tests/conftest.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/services/apple/apple_xml/xml_service.py (1)
152-153:⚠️ Potential issue | 🔴 CriticalCritical runtime error: datetime parsing will fail with Apple XML date format.
Lines 152-153 use
datetime.fromisoformat()to parse Apple Health XML dates, but Python 3.13'sfromisoformat()does not support the actual format"2026-01-15 22:56:09 +0000"(space before offset, no colon). This will raiseValueErrorat runtime.
_create_recordand_create_workoutcorrectly use_parse_date_fields()which callsstrptime()with format"%Y-%m-%d %H:%M:%S %z"—this pattern matches Apple's XML format and must be applied here.♻️ Fix: Use consistent date parsing
def _normalize_sleep_record(self, document: dict[str, Any]) -> SleepRecord | None: """Normalize a sleep record.""" value_to_stage = { "HKCategoryValueSleepAnalysisAsleepCore": SleepPhase.ASLEEP_LIGHT, "HKCategoryValueSleepAnalysisAsleepDeep": SleepPhase.ASLEEP_DEEP, "HKCategoryValueSleepAnalysisAsleepREM": SleepPhase.ASLEEP_REM, "HKCategoryValueSleepAnalysisAwake": SleepPhase.AWAKE, "HKCategoryValueSleepAnalysisInBed": SleepPhase.IN_BED, } stage = value_to_stage.get(document.get("value")) if stage is None: return None - start_date = datetime.fromisoformat(str(document.get("startDate"))) - end_date = datetime.fromisoformat(str(document.get("endDate"))) + document = self._parse_date_fields(document) + start_date = document["startDate"] + end_date = document["endDate"] source_info = self._extract_device_info(document.get("device", ""))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 152 - 153, The code is parsing Apple XML dates with datetime.fromisoformat for start_date and end_date which will raise ValueError for strings like "2026-01-15 22:56:09 +0000"; update the parsing to use the same logic as _parse_date_fields (or datetime.strptime with format "%Y-%m-%d %H:%M:%S %z") so start_date and end_date are parsed correctly; locate the assignments to start_date and end_date in xml_service.py (the same area where _create_record and _create_workout call _parse_date_fields) and replace fromisoformat usage with _parse_date_fields(...) or datetime.strptime(..., "%Y-%m-%d %H:%M:%S %z").
🧹 Nitpick comments (2)
backend/app/services/apple/apple_xml/xml_service.py (2)
393-405: Inconsistent chunk boundary check.Line 345 includes
sleep_recordsin the chunk size calculation:if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size:But line 393 does not:
if len(workouts) + len(time_series_records) >= self.chunk_size:This inconsistency could cause chunks to be larger than expected when sleep records accumulate alongside workouts.
♻️ Suggested fix for consistency
- if len(workouts) + len(time_series_records) >= self.chunk_size: + if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 393 - 405, The chunk boundary check is inconsistent: the earlier check includes len(sleep_records) but the later one at the yield only checks len(workouts) + len(time_series_records), which can let chunks exceed self.chunk_size when sleep_records accumulate; update the second condition to include len(sleep_records) (i.e., if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size) so the yield that calls self._wrap_sleep_data(sleep_records) and yields time_series_records, workouts, sync_request obeys the same chunk_size rule, then continue to clear time_series_records, workouts, sleep_records as already done.
442-467: Parse summary doesn't include sleep record statistics.The
_log_parse_summarymethod logs records and workouts but omits sleep record statistics (sleep_records_processed,sleep_records_skipped). This makes it harder to verify sleep data was imported correctly.♻️ Suggested enhancement to include sleep stats
def _log_parse_summary(self) -> None: """Log a summary of the parsing results.""" total_records = self.stats.records_processed + self.stats.records_skipped total_workouts = self.stats.workouts_processed + self.stats.workouts_skipped + total_sleep = self.stats.sleep_records_processed + self.stats.sleep_records_skipped self.log.info( - "XML parsing complete: %s/%s records processed, %s/%s workouts processed", + "XML parsing complete: %s/%s records, %s/%s workouts, %s/%s sleep records processed", self.stats.records_processed, total_records, self.stats.workouts_processed, total_workouts, + self.stats.sleep_records_processed, + total_sleep, ) - if self.stats.records_skipped > 0 or self.stats.workouts_skipped > 0: + if self.stats.records_skipped > 0 or self.stats.workouts_skipped > 0 or self.stats.sleep_records_skipped > 0: self.log.warning( - "Skipped %s records and %s workouts during import", + "Skipped %s records, %s workouts, and %s sleep records during import", self.stats.records_skipped, self.stats.workouts_skipped, + self.stats.sleep_records_skipped, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 442 - 467, The _log_parse_summary function currently omits sleep record counts; update it to include sleep_records_processed and sleep_records_skipped in the summary and warning blocks. Add sleep counts to the top-level summary message (e.g., include "%s/%s sleep records processed" alongside records/workouts), compute any needed totals (use self.stats.sleep_records_processed and self.stats.sleep_records_skipped), and include sleep skip counts in the conditional warning and in the skip_reasons breakdown when applicable. Ensure you reference the existing method _log_parse_summary and attributes sleep_records_processed and sleep_records_skipped so the new log lines mirror the style and formatting of the existing record/workout logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 365-374: The XML element `elem` is only cleared when
`_normalize_sleep_record` returns None, causing a memory leak for successful
sleep records; update the sleep-record handling in the XML processing loop so
that `elem.clear()` is always called regardless of success: after obtaining
`sleep_record = self._normalize_sleep_record(record)`, call `elem.clear()`
before appending to `sleep_records` and incrementing
`self.stats.sleep_records_processed` (or restructure so `elem.clear()` runs in a
common finally/cleanup path) to ensure elements are freed even when the code
continues; reference `_normalize_sleep_record`,
`self.stats.sleep_records_processed`, `self.stats.sleep_record_skip`, and
`elem.clear()` when making the change.
---
Duplicate comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 152-153: The code is parsing Apple XML dates with
datetime.fromisoformat for start_date and end_date which will raise ValueError
for strings like "2026-01-15 22:56:09 +0000"; update the parsing to use the same
logic as _parse_date_fields (or datetime.strptime with format "%Y-%m-%d %H:%M:%S
%z") so start_date and end_date are parsed correctly; locate the assignments to
start_date and end_date in xml_service.py (the same area where _create_record
and _create_workout call _parse_date_fields) and replace fromisoformat usage
with _parse_date_fields(...) or datetime.strptime(..., "%Y-%m-%d %H:%M:%S %z").
---
Nitpick comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 393-405: The chunk boundary check is inconsistent: the earlier
check includes len(sleep_records) but the later one at the yield only checks
len(workouts) + len(time_series_records), which can let chunks exceed
self.chunk_size when sleep_records accumulate; update the second condition to
include len(sleep_records) (i.e., if len(workouts) + len(time_series_records) +
len(sleep_records) >= self.chunk_size) so the yield that calls
self._wrap_sleep_data(sleep_records) and yields time_series_records, workouts,
sync_request obeys the same chunk_size rule, then continue to clear
time_series_records, workouts, sleep_records as already done.
- Around line 442-467: The _log_parse_summary function currently omits sleep
record counts; update it to include sleep_records_processed and
sleep_records_skipped in the summary and warning blocks. Add sleep counts to the
top-level summary message (e.g., include "%s/%s sleep records processed"
alongside records/workouts), compute any needed totals (use
self.stats.sleep_records_processed and self.stats.sleep_records_skipped), and
include sleep skip counts in the conditional warning and in the skip_reasons
breakdown when applicable. Ensure you reference the existing method
_log_parse_summary and attributes sleep_records_processed and
sleep_records_skipped so the new log lines mirror the style and formatting of
the existing record/workout logs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4853c1b9-31fb-479b-b704-3f4dcb0fc09c
📒 Files selected for processing (3)
backend/app/integrations/celery/tasks/process_aws_upload_task.pybackend/app/integrations/celery/tasks/process_xml_upload_task.pybackend/app/services/apple/apple_xml/xml_service.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 366-373: The branch handling HKCategoryTypeIdentifierSleepAnalysis
currently skips unknown sleep stages without logging; update that branch in the
method that calls _normalize_sleep_record so that when
_normalize_sleep_record(record) returns None you first emit a warning (including
the problematic record.get("value") and any useful identifiers like
record.get("startDate") or record.get("id")) and then call
self.stats.sleep_record_skip(...). This ensures the dropped sleep stage is
visible in logs and will be counted by _log_parse_summary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3690d1f7-19b1-4b4b-bf1d-d8d9fa4b6c6e
📒 Files selected for processing (1)
backend/app/services/apple/apple_xml/xml_service.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/apple/apple_xml/xml_service.py (1)
397-409:⚠️ Potential issue | 🟠 MajorCount buffered sleep records in the workout-side flush.
Once
sleep_recordsreachesself.chunk_size, a run of<Workout>elements will not trigger a flush because this branch only counts workouts and time-series records. The record-side guard already includes sleep records; this one should match.Patch
- if len(workouts) + len(time_series_records) >= self.chunk_size: + if len(workouts) + len(time_series_records) + len(sleep_records) >= self.chunk_size: self.log.info( - "Yielding chunk: %s time series records, %s workouts (skipped so far: %s records, %s workouts)", + "Yielding chunk: %s time series records, %s workouts, %s sleep records " + "(skipped so far: %s records, %s workouts, %s sleep records)", len(time_series_records), len(workouts), + len(sleep_records), self.stats.records_skipped, self.stats.workouts_skipped, + self.stats.sleep_records_skipped, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 397 - 409, The flush condition in the workout-side branch only checks len(workouts) + len(time_series_records) against self.chunk_size and omits sleep_records, so a large buffer of sleep_records won't trigger a flush; update the condition to include len(sleep_records) (e.g., len(time_series_records) + len(workouts) + len(sleep_records) >= self.chunk_size), update the log.info call to report len(sleep_records) as well, and ensure the reset logic still clears sleep_records after yielding the wrapped sync_request created by self._wrap_sleep_data.
🧹 Nitpick comments (1)
backend/app/services/apple/apple_xml/xml_service.py (1)
298-304: Prefer a UTC-awaresyncTimestamp.
datetime.now()creates a naive local timestamp. The parsed sleep windows are offset-aware, so this payload becomes environment-dependent once it is serialized or compared downstream.Patch
-from datetime import datetime +from datetime import UTC, datetime @@ - syncTimestamp=datetime.now(), + syncTimestamp=datetime.now(UTC),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/apple/apple_xml/xml_service.py` around lines 298 - 304, The syncTimestamp in _wrap_sleep_data is currently naive (datetime.now()) which can cause timezone-dependent behavior; change it to a UTC-aware timestamp by using a timezone-aware now (e.g., datetime.now(timezone.utc) or datetime.utcnow().replace(tzinfo=timezone.utc)) when constructing the SyncRequest in _wrap_sleep_data so the SyncRequest.syncTimestamp is offset-aware and consistent with the parsed sleep windows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 141-150: The current value_to_stage mapping in xml_service.py
omits generic asleep keys, causing "HKCategoryValueSleepAnalysisAsleep" and
"HKCategoryValueSleepAnalysisAsleepUnspecified" to be treated as unsupported;
update the dictionary used when resolving stage (the value_to_stage mapping
referenced before the stage = value_to_stage.get(document.get("value")) line) to
include entries mapping those two keys to SleepPhase.SLEEPING so that stage is
set correctly and the function returns a SleepPhase instead of None for generic
asleep values.
---
Outside diff comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 397-409: The flush condition in the workout-side branch only
checks len(workouts) + len(time_series_records) against self.chunk_size and
omits sleep_records, so a large buffer of sleep_records won't trigger a flush;
update the condition to include len(sleep_records) (e.g.,
len(time_series_records) + len(workouts) + len(sleep_records) >=
self.chunk_size), update the log.info call to report len(sleep_records) as well,
and ensure the reset logic still clears sleep_records after yielding the wrapped
sync_request created by self._wrap_sleep_data.
---
Nitpick comments:
In `@backend/app/services/apple/apple_xml/xml_service.py`:
- Around line 298-304: The syncTimestamp in _wrap_sleep_data is currently naive
(datetime.now()) which can cause timezone-dependent behavior; change it to a
UTC-aware timestamp by using a timezone-aware now (e.g.,
datetime.now(timezone.utc) or datetime.utcnow().replace(tzinfo=timezone.utc))
when constructing the SyncRequest in _wrap_sleep_data so the
SyncRequest.syncTimestamp is offset-aware and consistent with the parsed sleep
windows.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2452e2e9-ad90-4112-90be-2ba81304cf94
📒 Files selected for processing (1)
backend/app/services/apple/apple_xml/xml_service.py
| def sleep_record_skip(self, reason: str) -> None: | ||
| """Record a skipped sleep record with its reason.""" | ||
| self.sleep_records_skipped += 1 | ||
| self.skip_reasons[reason] = self.skip_reasons.get(reason, 0) + 1 |
There was a problem hiding this comment.
We've got a few issues here. First one - we break DRY rule, because we have 3 methods doing basically the same thing. The second one is how we handle categories of processes and skipped rows. Now everything (records, workouts and sleep) is handled in one container, while it's worth to consider taking everything into more abstract level to distinguish all entities.
My attempt to solve those issue:
from dataclasses import dataclass, field
from collections import Counter
@dataclass
class ParseMetric:
processed: int = 0
skipped: int = 0
reasons: Counter[str] = field(default_factory=Counter)
def process(self) -> None:
self.processed += 1
def skip(self, reason: str) -> None:
self.skipped += 1
self.reasons[reason] += 1
@dataclass
class XMLParseStats:
records: ParseMetric = field(default_factory=ParseMetric)
workouts: ParseMetric = field(default_factory=ParseMetric)
sleep: ParseMetric = field(default_factory=ParseMetric)
def record_skip(self, reason: str) -> None:
self.general.skip(reason)
def workout_skip(self, reason: str) -> None:
self.workouts.skip(reason)
def sleep_skip(self, reason: str) -> None:
self.sleep.skip(reason)I haven't tested that code, so if you want to use that, you have to test it first with real data. What has changed:
- imported
collections.Counter, because it handles potential missing data and start counting from 0 (so we can simplifyself.skip_reasons[reason] = self.skip_reasons.get(reason, 0) + 1toself.skip_reasons[reason] += 1) - separate
ParseMetricclass to handle processed and skipped rows like a pro ;) - now all entities have separated skip reasons, but if you prefer, we can go back to shared ones (or we can do something like
@property
all_skip_reasons(self) -> Counter:
return self.general.reasons + self.workouts.reasons + self.sleep.reasonsThere was a problem hiding this comment.
Also we had names:
workout_skipsleep_record_sleep
We should choose one convention. In the example above I hve normalised it to not use record inside both methods names.
There was a problem hiding this comment.
@coderabbitai ad to your knowledge lesson to use collections.Counter or collections.defaultdict if we need to handle missing keys. Also add another note to check whether methods break DRY rule, like it happened here.
There was a problem hiding this comment.
The PR description image shows a health/sleep tracking UI with a dark theme. It includes an "Average Sleep Stages" stacked bar chart (Deep 16%, REM 15%, Light 62%, Awake 7%) and a paginated "Sleep Sessions" list showing sessions from dates like Thu Mar 5 2026 and Wed Mar 4 2026, each tagged with apple_health_xml as the data source, along with duration, bedtime, and wake time details.
@KaliszS, noted! I've recorded both lessons for future reviews.
(◕ᴗ◕✿)
✏️ Learnings added
Learnt from: KaliszS
Repo: the-momentum/open-wearables PR: 583
File: backend/app/schemas/apple/apple_xml/stats.py:5-29
Timestamp: 2026-03-10T12:19:49.559Z
Learning: In Python, prefer `collections.Counter` or `collections.defaultdict` over manually checking for missing keys with `.get(key, 0) + 1`. For counting occurrences, `Counter` is idiomatic: `counter[key] += 1` works without initialization. Example seen in `backend/app/schemas/apple/apple_xml/stats.py` where `self.skip_reasons[reason] = self.skip_reasons.get(reason, 0) + 1` could be simplified to `self.skip_reasons[reason] += 1` with a `Counter`.
Learnt from: KaliszS
Repo: the-momentum/open-wearables PR: 583
File: backend/app/schemas/apple/apple_xml/stats.py:5-29
Timestamp: 2026-03-10T12:19:49.559Z
Learning: When reviewing classes with multiple methods that perform structurally identical operations (e.g., increment a counter and update a reasons dict), flag DRY violations and suggest extracting shared logic into a base method or a separate helper class. Example: `record_skip`, `workout_skip`, and `sleep_record_skip` in `backend/app/schemas/apple/apple_xml/stats.py` all do the same thing and should be refactored into a single shared `skip` mechanism, e.g., via a dedicated `ParseMetric` dataclass. Also flag inconsistent naming conventions across such methods (e.g., `workout_skip` vs `sleep_record_skip`).
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: czajkub
Repo: the-momentum/open-wearables PR: 557
File: backend/app/services/apple/healthkit/sleep_service.py:126-133
Timestamp: 2026-03-05T14:02:47.686Z
Learning: In `backend/app/services/apple/healthkit/sleep_service.py`, sleep stage records are guaranteed to be non-overlapping. This means a single incoming record cannot simultaneously extend both the session's `start_time` earlier and the session's `end_time` later. The `if/elif` pattern used to update session bounds in `_apply_transition` is intentional and correct given this domain constraint.
Learnt from: czajkub
Repo: the-momentum/open-wearables PR: 583
File: backend/app/services/apple/apple_xml/xml_service.py:0-0
Timestamp: 2026-03-09T15:19:53.533Z
Learning: In `backend/app/services/apple/apple_xml/xml_service.py`, `startDate` and `endDate` are required/guaranteed fields in Apple Health XML exports (HKCategoryTypeIdentifierSleepAnalysis records). They will always be present, so no additional None-check or fallback is needed before calling `datetime.fromisoformat` on them in `_normalize_sleep_record`.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| def _normalize_sleep_record(self, document: dict[str, Any]) -> SleepRecord | None: | ||
| """Normalize a sleep record.""" | ||
| value_to_stage = { | ||
| "HKCategoryValueSleepAnalysisAsleepCore": SleepPhase.ASLEEP_LIGHT, | ||
| "HKCategoryValueSleepAnalysisAsleepDeep": SleepPhase.ASLEEP_DEEP, | ||
| "HKCategoryValueSleepAnalysisAsleepREM": SleepPhase.ASLEEP_REM, | ||
| "HKCategoryValueSleepAnalysisAwake": SleepPhase.AWAKE, | ||
| "HKCategoryValueSleepAnalysisInBed": SleepPhase.IN_BED, | ||
| "HKCategoryValueSleepAnalysisAsleep": SleepPhase.SLEEPING, | ||
| "HKCategoryValueSleepAnalysisAsleepUnspecified": SleepPhase.SLEEPING, | ||
| } |
There was a problem hiding this comment.
How many times this function is called? Because if we use that frequently, we are going to waste our resources on building that dictionary over and over. It might be better to move that dict outside function definition (as _SLEEP_VALUE_MAP for example).
There was a problem hiding this comment.
Its called for each sleep stage (so about 20 stages per night), so definitely moving it outside will help. moved it alongside other constants in this file
There was a problem hiding this comment.
@coderabbitai - another note to save. While doing review, always check how often function/method is called and whether we can make it process faster by moving steps like creating constant data structures out of the function scope.
There was a problem hiding this comment.
@KaliszS, noted and saved to long-term memory!
ʕ •ᴥ•ʔ づ♡
✏️ Learnings added
Learnt from: KaliszS
Repo: the-momentum/open-wearables PR: 583
File: backend/app/services/apple/apple_xml/xml_service.py:0-0
Timestamp: 2026-03-10T15:21:23.270Z
Learning: During code review, always check how frequently a function or method is called. If a function builds constant data structures (e.g., dicts, sets, lists) on every invocation, flag it as a performance issue and suggest moving such structures to class-level constants or module-level variables. Example: in `backend/app/services/apple/apple_xml/xml_service.py`, the `value_to_stage` dict in `_normalize_sleep_record` was rebuilt on every call (~20 times per sleep night) and was correctly moved to a class-level constant `SLEEP_VALUE_TO_STAGE`. Apply this pattern broadly across Python files in this repo.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: czajkub
Repo: the-momentum/open-wearables PR: 583
File: backend/app/services/apple/apple_xml/xml_service.py:0-0
Timestamp: 2026-03-09T15:19:53.533Z
Learning: In backend/app/services/apple/apple_xml/xml_service.py, startDate and endDate are guaranteed to be present for Apple Health XML exports (HKCategoryTypeIdentifierSleepAnalysis records). Do not perform None-checks or fallback handling before calling datetime.fromisoformat on these fields in _normalize_sleep_record. Rely on their guaranteed presence, and consider adding a comment documenting this assumption to prevent future regressions.
Learnt from: czajkub
Repo: the-momentum/open-wearables PR: 557
File: backend/app/services/apple/healthkit/sleep_service.py:126-133
Timestamp: 2026-03-05T14:02:47.686Z
Learning: In `backend/app/services/apple/healthkit/sleep_service.py`, sleep stage records are guaranteed to be non-overlapping. This means a single incoming record cannot simultaneously extend both the session's `start_time` earlier and the session's `end_time` later. The `if/elif` pattern used to update session bounds in `_apply_transition` is intentional and correct given this domain constraint.
Learnt from: czajkub
Repo: the-momentum/open-wearables PR: 582
File: backend/app/services/apple/apple_xml/xml_service.py:105-132
Timestamp: 2026-03-09T12:19:14.951Z
Learning: In backend/app/services/apple/apple_xml/xml_service.py, enforce that the device attribute in Apple Health XML exports must match the HKDevice format: <<HKDevice: 0x...>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch6,12, software:26.2, creation date:...>. Treat any string that does not match this format as a malformed or tampered XML file and return an empty SourceInfo() in _extract_device_info. This behavior is intentional and should be maintained for correctness.
Learnt from: KaliszS
Repo: the-momentum/open-wearables PR: 583
File: backend/app/schemas/apple/apple_xml/stats.py:5-29
Timestamp: 2026-03-10T12:19:49.559Z
Learning: When reviewing classes with multiple methods that perform structurally identical operations (e.g., increment a counter and update a reasons dict), flag DRY violations and suggest extracting shared logic into a base method or a separate helper class. Example: `record_skip`, `workout_skip`, and `sleep_record_skip` in `backend/app/schemas/apple/apple_xml/stats.py` all do the same thing and should be refactored into a single shared `skip` mechanism, e.g., via a dedicated `ParseMetric` dataclass. Also flag inconsistent naming conventions across such methods (e.g., `workout_skip` vs `sleep_record_skip`).
Learnt from: KaliszS
Repo: the-momentum/open-wearables PR: 583
File: backend/app/schemas/apple/apple_xml/stats.py:5-29
Timestamp: 2026-03-10T12:19:49.559Z
Learning: In Python, prefer using collections.Counter or collections.defaultdict over manual key initialization when counting occurrences. For counting, use a Counter (or defaultdict(int)) so you can increment with counter[key] += 1 without prior checks. If applicable, refactor to use Counter for the data structure (e.g., replace self.skip_reasons with Counter and update syntax accordingly). Example: from collections import Counter; self.skip_reasons = Counter(); self.skip_reasons[reason] += 1. This improves readability and correctness in counting logic across Python files.
| save_sleep_state(user_id, current_state) | ||
|
|
||
| # import not at module level in order to avoid circular import | ||
| from app.integrations.celery.tasks.finalize_stale_sleep_task import finalize_stale_sleeps |
There was a problem hiding this comment.
We can still have this at module level to avoid confusion. It's enough to add if TYPE_CHECKING from typing library.
There was a problem hiding this comment.
I mean:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from app.integrations.celery.tasks.finalize_stale_sleep_task import finalize_stale_sleepsThere was a problem hiding this comment.
okay - i tried adding this, and pre-commit just changed it to
# also imported below to avoid circular import, adding it here to avoid confusion
if TYPE_CHECKING:
passit looks like it doesnt want to allow it this way
There was a problem hiding this comment.
probably because it was imported twice, and the other import (the one lower) shadowed the first import, so it showed it as unused and removed it
There was a problem hiding this comment.
nevermind - it worked, I just didnt remove the import in the middle

Description
Checklist
General
Backend Changes
You have to be in
backenddirectory to make it work:uv run pre-commit run --all-filespassesScreenshots
Additional Notes
The sleep records are passed to the
sleep_servicewhich normally accepts data from the mobile SDK for apple / android - the information passed is the same, so it's sent there to keep things stable across both ways of importing Apple Health data.Now, importing via the SDK and XML brings the same results in the database
Resolves #579
Resolves #573
Summary by CodeRabbit
New Features
Improvements
Tests