Skip to content

Commit e4fa56f

Browse files
authored
Merge pull request #36 from jjackson/labs-auditv2
feat: bulk image audit v2 — dynamic image types, HQ URL fallback, task tracking, audit of audits
2 parents 0390261 + 30a427b commit e4fa56f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+6351
-411
lines changed

.claude/AGENTS.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,40 @@ Structured audits of FLW visits with AI-powered reviews.
9595
- **AI review:** `audit/ai_review.py` runs validation agents on individual visits
9696
- **Uses:** `AnalysisPipeline` for visit data filtering
9797

98+
#### Audit API Contracts (used by workflow templates)
99+
100+
**Create async** `POST /audit/api/audit/create-async/`
101+
```json
102+
{ "opportunities": [{"id": 1, "name": "..."}], "criteria": {
103+
"audit_type": "date_range|last_n_per_opp",
104+
"start_date": "YYYY-MM-DD", "end_date": "YYYY-MM-DD",
105+
"count_per_opp": 10, "sample_percentage": 100,
106+
"related_fields": [{"image_path": "...", "filter_by_image": true}]
107+
}, "workflow_run_id": 123 }
108+
```
109+
Response: `{"success": true, "task_id": "..."}`. Task result has `{"sessions": [{"id", "title", "visits", "images"}]}`.
110+
111+
**Bulk data** `GET /audit/api/<session_id>/bulk-data/`
112+
Response: `{"assessments": [{id, visit_id, blob_id, question_id, opportunity_id, filename, result, notes, status, image_url, visit_date, entity_name, username, related_fields, ai_result, ai_notes}], ...}`
113+
Note: `opportunity_id` = `session.opportunity_id` (same for all assessments in a session). `status` = `"pass"|"fail"|"pending"`.
114+
115+
**Save progress** `POST /audit/api/<session_id>/save/`
116+
FormData: `visit_results` = JSON string of `{visit_id: {assessments: {blob_id: {question_id, result, notes, ai_result, ai_notes}}}}`
117+
118+
**Complete** `POST /audit/api/<session_id>/complete/`
119+
FormData: `overall_result` (`"pass"|"fail"`), `notes`, `kpi_notes` (can be `""`), `visit_results` (same shape as save).
120+
121+
**AI Review** `POST /audit/api/<session_id>/ai-review/`
122+
JSON body (NOT FormData): `{"assessments": [{"visit_id", "blob_id", "reading"}], "agent_id": "scale_validation", "opportunity_id": <int>}`
123+
Response: `{"results": [{"visit_id", "blob_id", "ai_result": "match|no_match|error", "ai_notes": "..."}]}`
124+
Note: `opportunity_id` is **required**. Use `a.opportunity_id` from the assessment object (not `selected_opps[0].id`).
125+
126+
**Opp search** `GET /audit/api/opportunities/search/?q=<query>`
127+
Response: `{"opportunities": [{"id", "name"}]}`
128+
129+
**Workflow sessions** `GET /audit/api/workflow/<workflow_run_id>/sessions/`
130+
Response: `{"sessions": [{"id", ...}]}` — fallback for session_id discovery after async creation.
131+
98132
### `tasks/` — Task Management
99133

100134
> See also: [`commcare_connect/tasks/README.md`](../commcare_connect/tasks/README.md) for data model details and testing guidance.
@@ -116,10 +150,54 @@ Data-driven workflows with custom React UIs and pipeline integration.
116150
- **DataAccess:** `WorkflowDataAccess`, `PipelineDataAccess` (both extend `BaseDataAccess`) in `workflow/data_access.py`
117151
- **Proxy models:** `WorkflowDefinitionRecord`, `WorkflowRenderCodeRecord`, `WorkflowRunRecord`, `WorkflowChatHistoryRecord`, `PipelineDefinitionRecord` (experiment=`"workflow"` / `"pipeline"`)
118152
- **Key views:** Workflow list (`/workflow/`), definition view, run view
119-
- **Templates:** Predefined workflow templates in `workflow/templates/` (audit_with_ai_review, performance_review, ocs_outreach)
153+
- **Templates:** Predefined workflow templates in `workflow/templates/` (audit_with_ai_review, bulk_image_audit, mbw_monitoring_v2, performance_review, ocs_outreach)
120154
- **Render code:** React components stored as LabsRecords, rendered dynamically in workflow runner
121155
- **Cross-app:** Can create audit sessions and tasks from workflow actions
122156

157+
#### Workflow Template Anatomy
158+
159+
Each template is a Python file in `workflow/templates/` that exports three dicts:
160+
161+
```python
162+
DEFINITION = {
163+
"name": str, "description": str, "version": 1,
164+
"templateType": str, # must match TEMPLATE["key"]
165+
"statuses": [...], # list of {id, label, color}
166+
"config": {...}, # e.g. {"showSummaryCards": True}
167+
"pipeline_sources": [],
168+
}
169+
170+
RENDER_CODE = """function WorkflowUI({ definition, instance, workers,
171+
pipelines, links, actions, onUpdateState }) {
172+
// Full React JSX component — Babel standalone transpiles in-browser, no build step
173+
// Inner components defined as const arrows INSIDE WorkflowUI to close over parent state
174+
// Phase router at bottom: {phase === 'foo' && <FooPhase />}
175+
}"""
176+
177+
TEMPLATE = {
178+
"key": str, # e.g. "bulk_image_audit" — unique, used for lookup
179+
"name": str,
180+
"description": str,
181+
"icon": str, # Font Awesome class e.g. "fa-images"
182+
"color": str, # Tailwind color e.g. "blue"
183+
"definition": DEFINITION,
184+
"render_code": RENDER_CODE,
185+
"pipeline_schema": None, # or dict for single pipeline; use "pipeline_schemas" list for multi
186+
}
187+
```
188+
189+
**Registration:** `__init__.py` auto-discovers via `pkgutil.iter_modules`. Also has explicit re-exports at the bottom — **add new templates to both the `from . import` line and `__all__`**.
190+
191+
**JSX-in-Python rules:**
192+
- Cannot use `"""` inside `RENDER_CODE` (Python string delimiter conflict)
193+
- Inner components must be defined BEFORE they are used (no hoisting)
194+
- State for child components is hoisted to outer `WorkflowUI` so it persists across re-renders
195+
- `onUpdateState(patch)` PATCH-merges into `run.data.state` on the server
196+
- Workflow props: `{ definition, instance, workers, pipelines, links, actions, onUpdateState }`
197+
- `actions.createAudit(payload)``POST /audit/api/audit/create-async/`
198+
- `actions.streamAuditProgress(task_id, onProgress, onComplete, onError)` → SSE stream
199+
- `actions.cancelAudit(task_id)` → cancel endpoint
200+
123201
### `ai/` — AI Agent Integration
124202

125203
> See also: [`commcare_connect/ai/README.md`](../commcare_connect/ai/README.md) for data model details and testing guidance.

.claude/launch.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,12 @@
2525
"runtimeExecutable": "celery",
2626
"runtimeArgs": ["-A", "config.celery_app", "worker", "-l", "info"],
2727
"port": 0
28+
},
29+
{
30+
"name": "celery-beat",
31+
"runtimeExecutable": "celery",
32+
"runtimeArgs": ["-A", "config.celery_app", "beat"],
33+
"port": 0
2834
}
2935
]
3036
}

commcare_connect/audit/analysis_config.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,13 @@ def extract_images_with_question_ids(visit_data: dict) -> list[dict]:
6868

6969
# Extract visit-level metadata
7070
username = visit_data.get("username") or ""
71-
visit_date = visit_data.get("visit_date") or ""
7271
entity_name = visit_data.get("entity_name") or "No Entity"
7372

7473
# Build filename->path map in a SINGLE traversal (O(m) where m=tree size)
7574
form_data = form_json.get("form", form_json)
75+
76+
# Use form.meta.timeEnd for actual submission time; fall back to visit_date (date only)
77+
visit_date = form_data.get("meta", {}).get("timeEnd") or visit_data.get("visit_date") or ""
7678
filename_map = _build_filename_map(form_data)
7779

7880
# Now each lookup is O(1) instead of O(m)

commcare_connect/audit/data_access.py

Lines changed: 113 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -98,12 +98,14 @@ def from_dict(cls, data: dict) -> "AuditCriteria":
9898
{
9999
"image_path": rf.get("image_path") or rf.get("imagePath", ""),
100100
"field_path": rf.get("field_path") or rf.get("fieldPath", ""),
101+
"hq_url_path": rf.get("hq_url_path") or rf.get("hqUrlPath", ""),
101102
"label": rf.get("label", ""),
102103
"filter_by_image": rf.get("filter_by_image") or rf.get("filterByImage", False),
103104
"filter_by_field": rf.get("filter_by_field") or rf.get("filterByField", False),
104105
}
105106
for rf in related_fields_raw
106-
if (rf.get("image_path") or rf.get("imagePath")) and (rf.get("field_path") or rf.get("fieldPath"))
107+
# Require image_path; field_path is optional (image-only filter rules are valid)
108+
if rf.get("image_path") or rf.get("imagePath")
107109
]
108110

109111
return cls(
@@ -175,10 +177,17 @@ def filter_visits_for_audit(
175177
if criteria.selected_flw_user_ids and "username" in df.columns:
176178
df = df[df["username"].isin(criteria.selected_flw_user_ids)]
177179

178-
# Apply sample percentage
180+
# Apply sample percentage — sample per FLW for equal representation, then shuffle
179181
if criteria.sample_percentage < 100 and len(df) > 0:
180-
sample_size = max(1, int(len(df) * criteria.sample_percentage / 100))
181-
df = df.sample(n=min(sample_size, len(df)), random_state=42)
182+
if "username" in df.columns:
183+
groups = []
184+
for _, grp in df.groupby("username", dropna=False):
185+
n = max(1, int(len(grp) * criteria.sample_percentage / 100))
186+
groups.append(grp.sample(n=min(n, len(grp)), random_state=42))
187+
df = pd.concat(groups).sample(frac=1, random_state=42)
188+
else:
189+
sample_size = max(1, int(len(df) * criteria.sample_percentage / 100))
190+
df = df.sample(n=min(sample_size, len(df)), random_state=42)
182191

183192
if return_visits:
184193
return df.to_dict("records")
@@ -737,6 +746,89 @@ def extract_images_for_visits(
737746
if str(vid) not in result:
738747
result[str(vid)] = []
739748

749+
# Build visit lookup once — shared by enrichment and fallback sections below
750+
visit_dict_by_id = {str(v.get("id", "")): v for v in visit_dicts}
751+
752+
# Fetch cc_domain for building CommCareHQ attachment URLs (cached, ~1 API call per hour)
753+
cc_domain = None
754+
try:
755+
from commcare_connect.workflow.templates.mbw_monitoring.data_fetchers import fetch_opportunity_metadata
756+
757+
meta = fetch_opportunity_metadata(self.access_token, opp_id)
758+
cc_domain = meta.get("cc_domain")
759+
except Exception as e:
760+
# Intentionally broad: cc_domain is optional for URL construction; any failure
761+
# (network, missing key, unexpected format) should degrade gracefully, not block audit.
762+
logger.debug(f"[ImageExtract] Could not fetch cc_domain for hq_url construction: {e}")
763+
764+
# Enrich Connect blob images with xform_id and build hq_url
765+
hq_base = settings.COMMCARE_HQ_URL.rstrip("/")
766+
for visit_id_str, images in result.items():
767+
visit_data = visit_dict_by_id.get(visit_id_str, {})
768+
form_json = visit_data.get("form_json", {})
769+
xform_id = form_json.get("id") or ""
770+
for img in images:
771+
img["xform_id"] = xform_id
772+
if cc_domain and xform_id and img.get("name") and not img.get("hq_url"):
773+
img["hq_url"] = f"{hq_base}/a/{cc_domain}/api/form/attachment/{xform_id}/{img['name']}"
774+
775+
# Fallback: for visits with no Connect blobs, extract CommCareHQ URL images
776+
# from form_json using related_fields rules.
777+
# Strategy 1: use hq_url_path (pre-computed URL stored in form JSON)
778+
# Strategy 2: extract filename from image_path, build HQ attachment URL
779+
# (used when hq_url_path is empty — e.g. dynamic image type discovery
780+
# can't resolve DataBindOnly XForm paths from the HQ app definition API)
781+
if related_fields:
782+
import hashlib
783+
784+
image_rules = [r for r in related_fields if r.get("image_path")]
785+
if image_rules:
786+
for visit_id_str, images in result.items():
787+
visit_data = visit_dict_by_id.get(visit_id_str, {})
788+
form_json = visit_data.get("form_json", {})
789+
form_data = form_json.get("form", form_json)
790+
xform_id = form_json.get("id") or ""
791+
username = visit_data.get("username") or ""
792+
# Use form.meta.timeEnd for actual submission time; fall back to visit_date (date only)
793+
visit_date = form_data.get("meta", {}).get("timeEnd") or visit_data.get("visit_date") or ""
794+
entity_name = visit_data.get("entity_name") or "No Entity"
795+
for rule in image_rules:
796+
hq_url_path = rule.get("hq_url_path", "")
797+
image_path = rule.get("image_path", "")
798+
799+
# Skip if this image type is already present (e.g. from Connect blob)
800+
if any(img.get("question_id") == image_path for img in images):
801+
continue
802+
803+
# Strategy 1: pre-computed URL field in form JSON
804+
hq_url = None
805+
if hq_url_path:
806+
extracted = self._extract_field_value(form_data, hq_url_path)
807+
if extracted and isinstance(extracted, str) and extracted.startswith("http"):
808+
hq_url = extracted
809+
810+
# Strategy 2: build URL from filename stored at image_path
811+
if not hq_url and cc_domain and xform_id and image_path:
812+
filename = self._extract_field_value(form_data, image_path)
813+
if filename and isinstance(filename, str) and not filename.startswith("http"):
814+
hq_url = f"{hq_base}/a/{cc_domain}/api/form/attachment/{xform_id}/{filename}"
815+
816+
if hq_url:
817+
blob_id = "hq_" + hashlib.sha256(hq_url.encode()).hexdigest()[:16]
818+
name = hq_url_path.split("/")[-1] if hq_url_path else image_path.split("/")[-1]
819+
images.append(
820+
{
821+
"blob_id": blob_id,
822+
"hq_url": hq_url,
823+
"xform_id": xform_id,
824+
"name": name,
825+
"question_id": image_path,
826+
"username": username,
827+
"visit_date": visit_date,
828+
"entity_name": entity_name,
829+
}
830+
)
831+
740832
# Add related field values if rules provided
741833
if related_fields:
742834
if progress_callback:
@@ -773,25 +865,23 @@ def _filter_visits_by_related_fields(
773865
if not filter_rules:
774866
return visit_images
775867

868+
image_filter_paths = [r.get("image_path", "") for r in filter_rules if r.get("filter_by_image")]
869+
field_filter_rules = [r for r in filter_rules if r.get("filter_by_field")]
870+
776871
filtered_result = {}
777872
for visit_id, images in visit_images.items():
778873
include_visit = True
779874

780-
for rule in filter_rules:
781-
image_path = rule.get("image_path", "")
782-
field_path = rule.get("field_path", "")
783-
filter_by_image = rule.get("filter_by_image", False)
784-
filter_by_field = rule.get("filter_by_field", False)
875+
# OR logic: include visit if it has ANY of the required image types
876+
if image_filter_paths:
877+
question_ids = {img.get("question_id") for img in images}
878+
if not any(p in question_ids for p in image_filter_paths):
879+
include_visit = False
785880

786-
# Check if this visit has the required image
787-
if filter_by_image:
788-
has_matching_image = any(img.get("question_id") == image_path for img in images)
789-
if not has_matching_image:
790-
include_visit = False
791-
break
792-
793-
# Check if this visit has the required field value
794-
if filter_by_field:
881+
# AND logic: visit must satisfy every field filter rule
882+
if include_visit:
883+
for rule in field_filter_rules:
884+
field_path = rule.get("field_path", "")
795885
has_field_value = False
796886
for img in images:
797887
for rf in img.get("related_fields", []):
@@ -1136,7 +1226,7 @@ def create_audit_creation_job(
11361226
opportunities: list[dict],
11371227
) -> dict:
11381228
"""Create an audit creation job record for tracking async creation."""
1139-
from datetime import datetime
1229+
from datetime import datetime, timezone
11401230

11411231
data = {
11421232
"task_id": task_id,
@@ -1154,8 +1244,8 @@ def create_audit_creation_job(
11541244
},
11551245
"result": None,
11561246
"error": None,
1157-
"created_at": datetime.now().isoformat(),
1158-
"updated_at": datetime.now().isoformat(),
1247+
"created_at": datetime.now(timezone.utc).isoformat(),
1248+
"updated_at": datetime.now(timezone.utc).isoformat(),
11591249
}
11601250

11611251
record = self.labs_api.create_record(
@@ -1237,7 +1327,7 @@ def update_audit_creation_job(
12371327
error: str | None = None,
12381328
) -> dict | None:
12391329
"""Update an audit creation job record."""
1240-
from datetime import datetime
1330+
from datetime import datetime, timezone
12411331

12421332
from commcare_connect.labs.models import LocalLabsRecord
12431333

@@ -1267,7 +1357,7 @@ def update_audit_creation_job(
12671357
data["result"] = result
12681358
if error is not None:
12691359
data["error"] = error
1270-
data["updated_at"] = datetime.now().isoformat()
1360+
data["updated_at"] = datetime.now(timezone.utc).isoformat()
12711361

12721362
# Save
12731363
updated = self.labs_api.update_record(

0 commit comments

Comments
 (0)