Skip to content

fix: convert gps_location from Python extractor to SQL path extraction#28

Merged
akkaouim merged 2 commits intojjackson:labs-mainfrom
akkaouim:labs-mbw-v3
Feb 28, 2026
Merged

fix: convert gps_location from Python extractor to SQL path extraction#28
akkaouim merged 2 commits intojjackson:labs-mainfrom
akkaouim:labs-mbw-v3

Conversation

@akkaouim
Copy link
Copy Markdown
Collaborator

@akkaouim akkaouim commented Feb 28, 2026

The extract_gps_location() extractor forced the SQL query to load form_json for all ~48k visits into Python memory (~480MB), causing the Fargate container to OOM-crash during process_and_cache().

Replace with COALESCE path-based extraction:

  • form.meta.location.#text (dict with #text key)
  • form.meta.location (string fallback)

This runs entirely in PostgreSQL — zero Python memory for form_json. All 13 pipeline fields are now path-based; no extractors remain.

Summary by CodeRabbit

  • Bug Fixes

    • Improved GPS extraction reliability in MBW monitoring so location is consistently read from form data across different formats, removing fragile extractor behavior.
  • Documentation

    • Clarified pipeline docs to reflect the new path-based GPS extraction approach and noted memory/processing considerations for large form sets.

The extract_gps_location() extractor forced the SQL query to load
form_json for all ~48k visits into Python memory (~480MB), causing
the Fargate container to OOM-crash during process_and_cache().

Replace with COALESCE path-based extraction:
  - form.meta.location.#text (dict with #text key)
  - form.meta.location (string fallback)

This runs entirely in PostgreSQL — zero Python memory for form_json.
All 13 pipeline fields are now path-based; no extractors remain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14c2c15 and 5ae3447.

📒 Files selected for processing (1)
  • commcare_connect/workflow/templates/mbw_monitoring/DOCUMENTATION.md

📝 Walkthrough

Walkthrough

Consolidates GPS extraction in the MBW monitoring pipeline by removing the Python extractor and switching gps_location to a path-based FieldComputation that uses two paths with SQL COALESCE fallback to handle both dict and string location formats.

Changes

Cohort / File(s) Summary
GPS Extraction Refactor
commcare_connect/workflow/templates/mbw_monitoring/pipeline_config.py, commcare_connect/workflow/templates/mbw_monitoring/DOCUMENTATION.md
Removed extract_gps_location helper and updated gps_location FieldComputation to use paths=[ "form.meta.location.#text", "form.meta.location" ]. Documentation updated to describe path-based extraction and COALESCE fallback; all 13 Visit form fields now path-extracted (no extractor use).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 Hopping through the pipeline bright,
Paths replace the tweaked old byte,
COALESCE finds the spot to roam,
GPS now finds its home,
Crunching data, carrot-sweet delight. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately reflects the main change: converting gps_location extraction from a Python extractor function to SQL path-based extraction, addressing a critical memory issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commcare_connect/workflow/templates/mbw_monitoring/DOCUMENTATION.md`:
- Line 558: Update the documentation so every occurrence of the gps_location
extractor is documented as a paths-based field (not extractor-based); find all
sections that currently describe gps_location as "extractor-based" and change
them to match the table row showing gps_location → paths with paths
`form.meta.location.#text`, `form.meta.location` and the COALESCE behavior (dict
`#text` key or string fallback); ensure any examples, headings, and migration
notes that mention gps_location/original extractor usage are rewritten to show
the paths-based extractor and consistent wording about COALESCE and path
resolution.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84d802b and 14c2c15.

📒 Files selected for processing (2)
  • commcare_connect/workflow/templates/mbw_monitoring/DOCUMENTATION.md
  • commcare_connect/workflow/templates/mbw_monitoring/pipeline_config.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@akkaouim akkaouim merged commit 17c1ad6 into jjackson:labs-main Feb 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant