Skip to content

Commit 6cc8e55

Browse files
committed
update task solver agent
1 parent b6dbff0 commit 6cc8e55

File tree

6 files changed

+571
-7
lines changed

6 files changed

+571
-7
lines changed

.github/agents/solve.task.agent.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1357,7 +1357,18 @@ Document the independent check in `step2_analysis/notes.md` under a
13571357
you MUST update these strings to match the latest results. Where possible, let
13581358
conclusions come from `results.json["conclusions"]` instead of hardcoding.
13591359
1360-
17. **Run the report generator** to produce the engineering report (Word + HTML):
1360+
17. **Run consistency checker** (MANDATORY before report generation):
1361+
```
1362+
Run in terminal: python devtools/consistency_checker.py task_solve/YYYY-MM-DD_slug/
1363+
```
1364+
The consistency checker:
1365+
- Extracts numerical values from all notebooks and results.json
1366+
- Detects inconsistencies: numerical mismatches, scope mismatches (e.g., volumetric vs mass-based), contradictory claims
1367+
- Produces `consistency_report.json` in the task folder
1368+
- **Fix any CRITICAL issues before generating the report**
1369+
- Common issue: external study data (e.g., Gudrun paper) measuring different quantities than notebook calculations — these need clarification in the report, not "fixing"
1370+
1371+
18. **Run the report generator** to produce the engineering report (Word + HTML):
13611372
```
13621373
Run in terminal: python step3_report/generate_report.py
13631374
```
@@ -1376,19 +1387,19 @@ Document the independent check in `step2_analysis/notes.md` under a
13761387
- All formatting renders automatically when corresponding keys exist in
13771388
`results.json` — no custom rendering code needed per task
13781389
1379-
18. **Update the task README** (`README.md` in the task folder):
1390+
19. **Update the task README** (`README.md` in the task folder):
13801391
- Fill in the Problem Statement
13811392
- Check off completed steps
13821393
- Write the Key Results section
13831394
13841395
### Phase 4: Knowledge Capture & Contribution
13851396
1386-
19. **Identify reusable outputs**:
1397+
20. **Identify reusable outputs**:
13871398
- If the notebook is generally useful → mention it could go to `examples/notebooks/`
13881399
- If a NeqSim API gap was found → document it for future development
13891400
- If a new pattern was discovered → note it for `CODE_PATTERNS.md`
13901401
1391-
20. **Fix and improve documentation** encountered during the task:
1402+
21. **Fix and improve documentation** encountered during the task:
13921403
- If you found **errors** in existing docs (wrong API signatures, outdated
13931404
patterns, incorrect examples), fix them and include the fixes in the PR.
13941405
- If you discovered **missing documentation** (undocumented classes, missing
@@ -1399,7 +1410,7 @@ Document the independent check in `step2_analysis/notes.md` under a
13991410
when adding new doc pages.
14001411
- Documentation fixes go in the **same PR** as the task outputs.
14011412
1402-
21. **Draft a task log entry** (but don't write to the file directly):
1413+
22. **Draft a task log entry** (but don't write to the file directly):
14031414
```
14041415
### YYYY-MM-DD — Task Title
14051416
**Type:** X (TypeName)
@@ -1409,7 +1420,7 @@ Document the independent check in `step2_analysis/notes.md` under a
14091420
```
14101421
Show this to the user for them to add to `docs/development/TASK_LOG.md`.
14111422
1412-
22. **Create a Pull Request** (if the user asks, or if reusable outputs were produced):
1423+
23. **Create a Pull Request** (if the user asks, or if reusable outputs were produced):
14131424
14141425
When the task produces reusable code (tests, notebooks, docs, API extensions),
14151426
offer to create a PR. If the user confirms, execute these steps:

.github/copilot-instructions.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1365,7 +1365,13 @@ docs, or the workspace root.
13651365
9. **For cost estimation:** Use component-level NeqSim classes (e.g., `SURFCostEstimator`, `SubseaCostEstimator`) instead of flat lump-sum estimates. Break down CAPEX into verifiable subcategories.
13661366
10. **Self-review before delivering:** Re-read all formulas checking for sign errors, double-counting, wrong time indexing, and missing terms. Compare key outputs against industry benchmarks.
13671367
11. **Benchmark validation (MANDATORY):** Create a separate benchmark notebook (`XX_benchmark_validation.ipynb`) comparing NeqSim results against independent reference data (NIST, textbook examples, published cases, industry benchmarks). Include at least 3 data points, a parity/deviation plot, and save `benchmark_validation` results to `results.json`. Include benchmark comparison in the final report.
1368-
12. **Uncertainty analysis (MANDATORY):** Create a separate uncertainty notebook (`XX_uncertainty_risk_analysis.ipynb`) that:
1368+
12. **Consistency check (MANDATORY before report):** Run `python devtools/consistency_checker.py task_solve/YYYY-MM-DD_slug/` before generating reports. This tool:
1369+
- Extracts numerical values from all notebooks and results.json
1370+
- Detects inconsistencies: numerical mismatches, scope mismatches (volumetric vs mass-based), contradictory claims
1371+
- Produces `consistency_report.json` with issues to fix
1372+
- **Fix CRITICAL issues before generating the report**
1373+
- Common issue: external study data (e.g., Gudrun) measuring different quantities than notebook calculations
1374+
13. **Uncertainty analysis (MANDATORY):** Create a separate uncertainty notebook (`XX_uncertainty_risk_analysis.ipynb`) that:
13691375
- Identifies key uncertain input parameters with realistic ranges (low/base/high or probability distributions)
13701376
- **MUST use full NeqSim process simulations inside the Monte Carlo loop** — do NOT
13711377
use simplified Python correlations when NeqSim classes exist for the calculation

AGENTS.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,14 @@ workspace root.
103103
- Saves `uncertainty` and `risk_evaluation` results to `results.json`
104104
- **Save results.json** in the task root (see pattern below)
105105

106+
**Step 2.5 — Consistency Check (MANDATORY before report)**
107+
- Run `python devtools/consistency_checker.py task_solve/YYYY-MM-DD_slug/`
108+
- The tool extracts numerical values from all notebooks and results.json
109+
- Detects inconsistencies: numerical mismatches, scope mismatches (e.g., volumetric vs mass-based), contradictory claims
110+
- Produces `consistency_report.json` in the task folder
111+
- **Fix any CRITICAL issues before generating the report**
112+
- Common issues: external study data measuring different quantities than notebook calculations
113+
106114
**Step 3 — Report**
107115
- `generate_report.py` auto-reads `task_spec.md` and `results.json`
108116
- Run `python step3_report/generate_report.py` to produce a professional
@@ -294,6 +302,41 @@ if report.getWarningCount() > 0:
294302
assert report.isValid(), "results.json failed validation — fix errors above"
295303
```
296304
305+
### Iterative Updates to results.json
306+
307+
When working iteratively with continuous updates:
308+
309+
1. **Load before Modifying** — Always read existing results.json before adding new data:
310+
```python
311+
results_path = TASK_DIR / "results.json"
312+
if results_path.exists():
313+
with open(results_path, "r") as f:
314+
results = json.load(f)
315+
else:
316+
results = {}
317+
```
318+
319+
2. **Use dict.update() for New Data** — Merge new results without losing existing:
320+
```python
321+
results["key_results"] = {**results.get("key_results", {}), "new_result": 42.5}
322+
results["figure_captions"] = {**results.get("figure_captions", {}), "new_plot.png": "Caption"}
323+
```
324+
325+
3. **Append to Lists** — For discussion, tables, equations:
326+
```python
327+
results.setdefault("figure_discussion", []).append(new_discussion)
328+
results.setdefault("tables", []).append(new_table)
329+
```
330+
331+
4. **Run Consistency Check** before report generation:
332+
```bash
333+
python devtools/consistency_checker.py task_solve/YYYY-MM-DD_slug/
334+
```
335+
336+
5. **Regenerate Report** — The report generator dynamically includes sections based on
337+
what's present in results.json. Adding `uncertainty` or `risk_evaluation` automatically
338+
creates those sections in the report.
339+
297340
The report generator auto-reads this file to populate Results and Validation sections.
298341
- **key_results**: Rendered as styled table with auto-detected units (use suffixes like `_C`, `_bar`, `_kg`, `_hours`)
299342
- **validation**: Rendered as pass/fail table with color coding

devtools/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ for a full explanation of the architecture and internals.
5353
| `neqsim_dev_setup.py` | JVM bootstrap, class imports, compile + kernel restart |
5454
| `pyproject.toml` | Makes it pip-installable (`pip install -e devtools/`) |
5555
| `new_task.py` | Create task-solving folders for the 4-step AI workflow |
56+
| `consistency_checker.py` | **Pre-report quality gate.** Extracts numerical values from notebooks and results.json, detects inconsistencies (numerical mismatches, scope mismatches, contradictory claims). Run before `generate_report.py`. Produces `consistency_report.json`. |
5657
| `unisim_reader.py` | UniSim/HYSYS .usc COM reader → NeqSim Python/notebook/EOT/JSON. 45+ op types, port-specific forward refs, auto-recycle wiring. |
5758
| `test_unisim_outputs.py` | 14 pytest tests for all UniSim converter output modes (no COM needed) |
5859
| `explore_unisim_com.py` | Diagnostic: dump UniSim COM object model from any .usc file |

0 commit comments

Comments
 (0)