Skip to content
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c062561
docs: add Bulk Image Audit workflow template design
Feb 25, 2026
2f14616
docs: add Bulk Image Audit implementation plan
Feb 25, 2026
438d5ae
feat: rename Weekly Audit template to Weekly KMC Audit with AI Review
Feb 25, 2026
478c38e
feat: add Bulk Image Audit workflow template skeleton
Feb 26, 2026
a6a7948
feat: bulk image audit React shell — IMAGE_TYPES constant, phase stat…
Feb 26, 2026
bc90b91
feat: bulk image audit config form — opp selector, image type, visit …
Feb 26, 2026
44b1e10
fix: add handleCreate no-op guard until Task 5 implements the real ha…
Feb 26, 2026
c90221a
feat: bulk image audit config submit + creating phase with progress b…
Feb 26, 2026
224df07
fix: handleCreate null guard, handleCancel finally, reconnect else br…
Feb 26, 2026
234c363
feat: bulk image audit review phase — stats bar, photo list, AI revie…
Feb 26, 2026
0144e79
feat: bulk image audit — fix code review issues + FLW summary table
Feb 26, 2026
2bd64dd
fix: make Result column sortable in FLW summary table
Feb 26, 2026
6726d7d
fix: AI review uses per-assessment opportunity_id, CompletedFlwTable …
Feb 26, 2026
6216b53
merge: labs-main into labs-auditv2 — add mbw_monitoring_v2 to re-exports
Feb 26, 2026
57937ea
docs: add workflow template anatomy and audit API contracts to AGENTS.md
Feb 26, 2026
bf140a6
fix: squash broken admin_boundaries migrations and add labs requirements
Mar 2, 2026
cb5f4cf
fix: bulk image audit UX fixes, image parsing, and cache performance
Mar 3, 2026
0c7746a
fix: bulk_image_audit UX fixes and opportunity name lookup
Mar 4, 2026
99f0f69
feat: bulk_image_audit sessions table UI and workflow list improvements
Mar 4, 2026
0a4d90a
fix: bulk_image_audit workflow - skip sessions table, add FLW summary…
Mar 5, 2026
a6c1fcc
fix: bulk_image_audit - 8 UX tweaks (sort, FLW names, state wrap, red…
Mar 5, 2026
d7d8f01
fix: image links, visit timestamps, run status/period, and image type…
Mar 6, 2026
92816de
Merge branch 'labs-main' into labs-auditv2
theism Mar 9, 2026
e78ce8b
fix: address CodeRabbit review findings for labs-auditv2 merge
Mar 9, 2026
8a13533
Add Audit of Audits admin report
Mar 10, 2026
91c82a7
Audit of Audits: fix period normalization, session join, opp names, r…
Mar 10, 2026
3d136f3
Add debug script for inspecting image fields in audit visits
Mar 10, 2026
2a54e2a
Add celery-beat launch config, HQ image proxy URL, CLI token and thre…
Mar 10, 2026
d78c56b
Add design spec for Bulk Image Audit dynamic image types
Mar 10, 2026
6756bbc
Merge labs-auditv3 into labs-auditv2
Mar 10, 2026
35e51de
feat(audit): add hq_app_utils for image question extraction with alwa…
Mar 10, 2026
71a1d69
fix(audit): skip xform root in ancestor walk, add clarifying comment
Mar 10, 2026
d272ba7
feat(audit): add OpportunityImageQuestionsAPIView for dynamic image t…
Mar 10, 2026
567cbfb
fix(audit): clarify cc_domain/cc_app_id guard comment in image questi…
Mar 10, 2026
5b6507c
feat(workflow): bulk image audit — dynamic image types, remove opp se…
Mar 10, 2026
2ddc46c
fix(workflow): fix stale closure in checkbox handler, fix image type …
Mar 10, 2026
8aa6225
feat(audit): add image type filter dropdown to bulk assessment page
Mar 10, 2026
26643bf
style: apply black and isort formatting
Mar 10, 2026
446ccbc
style: fix unused import and E501 long lines in bulk_image_audit
Mar 10, 2026
3a40d8a
fix(audit): deduplicate image question IDs across forms using full pa…
Mar 10, 2026
3aec726
feat(audit-of-audits): add org-scoped config phase with multi-select …
Mar 10, 2026
29b5bfb
fix(audit): handle null calculate field in DataBindOnly questions
Mar 11, 2026
5abb95f
feat: bulk image audit improvements — HQ URL fallback, task tracking,…
Mar 11, 2026
76f7e87
Merge labs-main into labs-auditv2, resolve conflicts
Mar 11, 2026
b610a57
fix: address CodeRabbit PR review findings
Mar 11, 2026
30a427b
fix: server-side Dimagi guard on create_workflow view + boundary_id c…
Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 79 additions & 1 deletion .claude/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,40 @@ Structured audits of FLW visits with AI-powered reviews.
- **AI review:** `audit/ai_review.py` runs validation agents on individual visits
- **Uses:** `AnalysisPipeline` for visit data filtering

#### Audit API Contracts (used by workflow templates)

**Create async** `POST /audit/api/audit/create-async/`
```json
{ "opportunities": [{"id": 1, "name": "..."}], "criteria": {
"audit_type": "date_range|last_n_per_opp",
"start_date": "YYYY-MM-DD", "end_date": "YYYY-MM-DD",
"count_per_opp": 10, "sample_percentage": 100,
"related_fields": [{"image_path": "...", "filter_by_image": true}]
}, "workflow_run_id": 123 }
```
Response: `{"success": true, "task_id": "..."}`. Task result has `{"sessions": [{"id", "title", "visits", "images"}]}`.

**Bulk data** `GET /audit/api/<session_id>/bulk-data/`
Response: `{"assessments": [{id, visit_id, blob_id, question_id, opportunity_id, filename, result, notes, status, image_url, visit_date, entity_name, username, related_fields, ai_result, ai_notes}], ...}`
Note: `opportunity_id` = `session.opportunity_id` (same for all assessments in a session). `status` = `"pass"|"fail"|"pending"`.

**Save progress** `POST /audit/api/<session_id>/save/`
FormData: `visit_results` = JSON string of `{visit_id: {assessments: {blob_id: {question_id, result, notes, ai_result, ai_notes}}}}`

**Complete** `POST /audit/api/<session_id>/complete/`
FormData: `overall_result` (`"pass"|"fail"`), `notes`, `kpi_notes` (can be `""`), `visit_results` (same shape as save).

**AI Review** `POST /audit/api/<session_id>/ai-review/`
JSON body (NOT FormData): `{"assessments": [{"visit_id", "blob_id", "reading"}], "agent_id": "scale_validation", "opportunity_id": <int>}`
Response: `{"results": [{"visit_id", "blob_id", "ai_result": "match|no_match|error", "ai_notes": "..."}]}`
Note: `opportunity_id` is **required**. Use `a.opportunity_id` from the assessment object (not `selected_opps[0].id`).

**Opp search** `GET /audit/api/opportunities/search/?q=<query>`
Response: `{"opportunities": [{"id", "name"}]}`

**Workflow sessions** `GET /audit/api/workflow/<workflow_run_id>/sessions/`
Response: `{"sessions": [{"id", ...}]}` — fallback for session_id discovery after async creation.

### `tasks/` — Task Management

> See also: [`commcare_connect/tasks/README.md`](../commcare_connect/tasks/README.md) for data model details and testing guidance.
Expand All @@ -116,10 +150,54 @@ Data-driven workflows with custom React UIs and pipeline integration.
- **DataAccess:** `WorkflowDataAccess`, `PipelineDataAccess` (both extend `BaseDataAccess`) in `workflow/data_access.py`
- **Proxy models:** `WorkflowDefinitionRecord`, `WorkflowRenderCodeRecord`, `WorkflowRunRecord`, `WorkflowChatHistoryRecord`, `PipelineDefinitionRecord` (experiment=`"workflow"` / `"pipeline"`)
- **Key views:** Workflow list (`/workflow/`), definition view, run view
- **Templates:** Predefined workflow templates in `workflow/templates/` (audit_with_ai_review, performance_review, ocs_outreach)
- **Templates:** Predefined workflow templates in `workflow/templates/` (audit_with_ai_review, bulk_image_audit, mbw_monitoring_v2, performance_review, ocs_outreach)
- **Render code:** React components stored as LabsRecords, rendered dynamically in workflow runner
- **Cross-app:** Can create audit sessions and tasks from workflow actions

#### Workflow Template Anatomy

Each template is a Python file in `workflow/templates/` that exports three dicts:

```python
DEFINITION = {
"name": str, "description": str, "version": 1,
"templateType": str, # must match TEMPLATE["key"]
"statuses": [...], # list of {id, label, color}
"config": {...}, # e.g. {"showSummaryCards": True}
"pipeline_sources": [],
}

RENDER_CODE = """function WorkflowUI({ definition, instance, workers,
pipelines, links, actions, onUpdateState }) {
// Full React JSX component — Babel standalone transpiles in-browser, no build step
// Inner components defined as const arrows INSIDE WorkflowUI to close over parent state
// Phase router at bottom: {phase === 'foo' && <FooPhase />}
}"""

TEMPLATE = {
"key": str, # e.g. "bulk_image_audit" — unique, used for lookup
"name": str,
"description": str,
"icon": str, # Font Awesome class e.g. "fa-images"
"color": str, # Tailwind color e.g. "blue"
"definition": DEFINITION,
"render_code": RENDER_CODE,
"pipeline_schema": None, # or dict for single pipeline; use "pipeline_schemas" list for multi
}
```

**Registration:** `__init__.py` auto-discovers via `pkgutil.iter_modules`. Also has explicit re-exports at the bottom — **add new templates to both the `from . import` line and `__all__`**.

**JSX-in-Python rules:**
- Cannot use `"""` inside `RENDER_CODE` (Python string delimiter conflict)
- Inner components must be defined BEFORE they are used (no hoisting)
- State for child components is hoisted to outer `WorkflowUI` so it persists across re-renders
- `onUpdateState(patch)` PATCH-merges into `run.data.state` on the server
- Workflow props: `{ definition, instance, workers, pipelines, links, actions, onUpdateState }`
- `actions.createAudit(payload)` → `POST /audit/api/audit/create-async/`
- `actions.streamAuditProgress(task_id, onProgress, onComplete, onError)` → SSE stream
- `actions.cancelAudit(task_id)` → cancel endpoint

### `ai/` — AI Agent Integration

> See also: [`commcare_connect/ai/README.md`](../commcare_connect/ai/README.md) for data model details and testing guidance.
Expand Down
6 changes: 6 additions & 0 deletions .claude/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@
"runtimeExecutable": "celery",
"runtimeArgs": ["-A", "config.celery_app", "worker", "-l", "info"],
"port": 0
},
{
"name": "celery-beat",
"runtimeExecutable": "celery",
"runtimeArgs": ["-A", "config.celery_app", "beat"],
"port": 0
}
]
}
4 changes: 3 additions & 1 deletion commcare_connect/audit/analysis_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,13 @@ def extract_images_with_question_ids(visit_data: dict) -> list[dict]:

# Extract visit-level metadata
username = visit_data.get("username") or ""
visit_date = visit_data.get("visit_date") or ""
entity_name = visit_data.get("entity_name") or "No Entity"

# Build filename->path map in a SINGLE traversal (O(m) where m=tree size)
form_data = form_json.get("form", form_json)

# Use form.meta.timeEnd for actual submission time; fall back to visit_date (date only)
visit_date = form_data.get("meta", {}).get("timeEnd") or visit_data.get("visit_date") or ""
filename_map = _build_filename_map(form_data)

# Now each lookup is O(1) instead of O(m)
Expand Down
134 changes: 111 additions & 23 deletions commcare_connect/audit/data_access.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,14 @@ def from_dict(cls, data: dict) -> "AuditCriteria":
{
"image_path": rf.get("image_path") or rf.get("imagePath", ""),
"field_path": rf.get("field_path") or rf.get("fieldPath", ""),
"hq_url_path": rf.get("hq_url_path") or rf.get("hqUrlPath", ""),
"label": rf.get("label", ""),
"filter_by_image": rf.get("filter_by_image") or rf.get("filterByImage", False),
"filter_by_field": rf.get("filter_by_field") or rf.get("filterByField", False),
}
for rf in related_fields_raw
if (rf.get("image_path") or rf.get("imagePath")) and (rf.get("field_path") or rf.get("fieldPath"))
# Require image_path; field_path is optional (image-only filter rules are valid)
if rf.get("image_path") or rf.get("imagePath")
]

return cls(
Expand Down Expand Up @@ -175,10 +177,17 @@ def filter_visits_for_audit(
if criteria.selected_flw_user_ids and "username" in df.columns:
df = df[df["username"].isin(criteria.selected_flw_user_ids)]

# Apply sample percentage
# Apply sample percentage — sample per FLW for equal representation, then shuffle
if criteria.sample_percentage < 100 and len(df) > 0:
sample_size = max(1, int(len(df) * criteria.sample_percentage / 100))
df = df.sample(n=min(sample_size, len(df)), random_state=42)
if "username" in df.columns:
groups = []
for _, grp in df.groupby("username", dropna=False):
n = max(1, int(len(grp) * criteria.sample_percentage / 100))
groups.append(grp.sample(n=min(n, len(grp)), random_state=42))
df = pd.concat(groups).sample(frac=1, random_state=42)
else:
sample_size = max(1, int(len(df) * criteria.sample_percentage / 100))
df = df.sample(n=min(sample_size, len(df)), random_state=42)

if return_visits:
return df.to_dict("records")
Expand Down Expand Up @@ -737,6 +746,87 @@ def extract_images_for_visits(
if str(vid) not in result:
result[str(vid)] = []

# Build visit lookup once — shared by enrichment and fallback sections below
visit_dict_by_id = {str(v.get("id", "")): v for v in visit_dicts}

# Fetch cc_domain for building CommCareHQ attachment URLs (cached, ~1 API call per hour)
cc_domain = None
try:
from commcare_connect.workflow.templates.mbw_monitoring.data_fetchers import fetch_opportunity_metadata

meta = fetch_opportunity_metadata(self.access_token, opp_id)
cc_domain = meta.get("cc_domain")
except Exception as e:
# Intentionally broad: cc_domain is optional for URL construction; any failure
# (network, missing key, unexpected format) should degrade gracefully, not block audit.
logger.debug(f"[ImageExtract] Could not fetch cc_domain for hq_url construction: {e}")

# Enrich Connect blob images with xform_id and build hq_url
hq_base = settings.COMMCARE_HQ_URL.rstrip("/")
for visit_id_str, images in result.items():
visit_data = visit_dict_by_id.get(visit_id_str, {})
form_json = visit_data.get("form_json", {})
xform_id = form_json.get("id") or ""
for img in images:
img["xform_id"] = xform_id
if cc_domain and xform_id and img.get("name") and not img.get("hq_url"):
img["hq_url"] = f"{hq_base}/a/{cc_domain}/api/form/attachment/{xform_id}/{img['name']}"

# Fallback: for visits with no Connect blobs, extract CommCareHQ URL images
# from form_json using related_fields rules.
# Strategy 1: use hq_url_path (pre-computed URL stored in form JSON)
# Strategy 2: extract filename from image_path, build HQ attachment URL
# (used when hq_url_path is empty — e.g. dynamic image type discovery
# can't resolve DataBindOnly XForm paths from the HQ app definition API)
if related_fields:
import hashlib

image_rules = [r for r in related_fields if r.get("image_path")]
if image_rules:
for visit_id_str, images in result.items():
if images:
continue # Already has Connect blob images
visit_data = visit_dict_by_id.get(visit_id_str, {})
form_json = visit_data.get("form_json", {})
form_data = form_json.get("form", form_json)
xform_id = form_json.get("id") or ""
username = visit_data.get("username") or ""
# Use form.meta.timeEnd for actual submission time; fall back to visit_date (date only)
visit_date = form_data.get("meta", {}).get("timeEnd") or visit_data.get("visit_date") or ""
entity_name = visit_data.get("entity_name") or "No Entity"
for rule in image_rules:
hq_url_path = rule.get("hq_url_path", "")
image_path = rule.get("image_path", "")

# Strategy 1: pre-computed URL field in form JSON
hq_url = None
if hq_url_path:
extracted = self._extract_field_value(form_data, hq_url_path)
if extracted and isinstance(extracted, str) and extracted.startswith("http"):
hq_url = extracted

# Strategy 2: build URL from filename stored at image_path
if not hq_url and cc_domain and xform_id and image_path:
filename = self._extract_field_value(form_data, image_path)
if filename and isinstance(filename, str) and not filename.startswith("http"):
hq_url = f"{hq_base}/a/{cc_domain}/api/form/attachment/{xform_id}/{filename}"

if hq_url:
blob_id = "hq_" + hashlib.sha256(hq_url.encode()).hexdigest()[:16]
name = hq_url_path.split("/")[-1] if hq_url_path else image_path.split("/")[-1]
images.append(
{
"blob_id": blob_id,
"hq_url": hq_url,
"xform_id": xform_id,
"name": name,
"question_id": image_path,
"username": username,
"visit_date": visit_date,
"entity_name": entity_name,
}
)

# Add related field values if rules provided
if related_fields:
if progress_callback:
Expand Down Expand Up @@ -773,25 +863,23 @@ def _filter_visits_by_related_fields(
if not filter_rules:
return visit_images

image_filter_paths = [r.get("image_path", "") for r in filter_rules if r.get("filter_by_image")]
field_filter_rules = [r for r in filter_rules if r.get("filter_by_field")]

filtered_result = {}
for visit_id, images in visit_images.items():
include_visit = True

for rule in filter_rules:
image_path = rule.get("image_path", "")
field_path = rule.get("field_path", "")
filter_by_image = rule.get("filter_by_image", False)
filter_by_field = rule.get("filter_by_field", False)
# OR logic: include visit if it has ANY of the required image types
if image_filter_paths:
question_ids = {img.get("question_id") for img in images}
if not any(p in question_ids for p in image_filter_paths):
include_visit = False

# Check if this visit has the required image
if filter_by_image:
has_matching_image = any(img.get("question_id") == image_path for img in images)
if not has_matching_image:
include_visit = False
break

# Check if this visit has the required field value
if filter_by_field:
# AND logic: visit must satisfy every field filter rule
if include_visit:
for rule in field_filter_rules:
field_path = rule.get("field_path", "")
has_field_value = False
for img in images:
for rf in img.get("related_fields", []):
Expand Down Expand Up @@ -1136,7 +1224,7 @@ def create_audit_creation_job(
opportunities: list[dict],
) -> dict:
"""Create an audit creation job record for tracking async creation."""
from datetime import datetime
from datetime import datetime, timezone

data = {
"task_id": task_id,
Expand All @@ -1154,8 +1242,8 @@ def create_audit_creation_job(
},
"result": None,
"error": None,
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat(),
"created_at": datetime.now(timezone.utc).isoformat(),
"updated_at": datetime.now(timezone.utc).isoformat(),
}

record = self.labs_api.create_record(
Expand Down Expand Up @@ -1237,7 +1325,7 @@ def update_audit_creation_job(
error: str | None = None,
) -> dict | None:
"""Update an audit creation job record."""
from datetime import datetime
from datetime import datetime, timezone

from commcare_connect.labs.models import LocalLabsRecord

Expand Down Expand Up @@ -1267,7 +1355,7 @@ def update_audit_creation_job(
data["result"] = result
if error is not None:
data["error"] = error
data["updated_at"] = datetime.now().isoformat()
data["updated_at"] = datetime.now(timezone.utc).isoformat()

# Save
updated = self.labs_api.update_record(
Expand Down
Loading