[Benchmark Output Submission]: Verificate HELIX V1.4

### Agent Name

Verificate HELIX V1.4

### Maintainer

Craig Atkinson / Verificate

### Model(s) Used

Granite 4.0 Small Q4

### Agent Description

HELIX v1.4 is a production-grade inference engine built by Verificate, optimized for CPU-only execution of large language models. This submission covers all 223 VAKRA benchmark questions across all 4 capabilities (222/223 successful, 99.6% completion rate).


### Metadata (JSON)

{
  "submitter": "Verificate",
  "engine": "HELIX v1.4 CPU Inference Engine",
  "model": "IBM Granite 4.0 Small Q4 (32B parameters)",
  "infrastructure": "OpenShift CPU Pod — AMD EPYC 9254, 24 threads, NUMA node0 (HPC Fusion, llama_v14 profile)",
  "helix_config": {
    "HELIX_SLICE_EXECUTION": 1,
    "HELIX_ACTIVE_SLICE_RATIO": 0.50,
    "HELIX_SLICE_TOPK": 1,
    "HELIX_PROGRESSIVE_MODE": 0,
    "HELIX_GATE_BY_UTS": 1,
    "VERIFICATE_BATCH_SIZE": 1024
  },
  "speed_metrics": {
    "cap1_avg_duration_s": 47.3,
    "cap2_avg_duration_s": 31.4,
    "cap3_avg_duration_s": 42.1,
    "cap4_avg_duration_s": 45.0,
    "overall_avg_duration_s": 40.5,
    "per_llm_call_s": "8–11s (HELIX slicing active)"
  },
  "success_metrics": {
    "total_queries": 223,
    "successful_executions": 222,
    "error_executions": 1,
    "completion_rate": "99.6%"
  },
  "agent_fixes": [
    "JSON-safe tool result truncation (4000 chars, dict handles preserved)",
    "Context message truncation (3000 chars on prior-turn assistant messages)",
    "Raw tool call artifact re-execution (XML and JSON blob formats)",
    "Synthesis fallback (final no-tool LLM call for useless answers)",
    "Extended useless answer detection (dicts, lists, Python reprs, error strings)"
  ],
  "notes": "Two-pass tool pre-selection active for all 4 capabilities. HELIX slice execution confirmed active via per-call latency (8–11s vs ~30s baseline). 222/223 questions answered successfully. Cap 4 multi-turn achieved 0 timeouts. Official VAKRA schema validation passed all 4 capability files."
}

### ZIP File Link

https://www.dropbox.com/scl/fi/zxc96xuf6ritnjcji4ssg/Verificate_HELIX_v1.4_HELIX_Sliced_Submission_20260409.zip?rlkey=pucw6vflj9crdqzrlxzmaf8hk&st=u0owc77m&dl=0

### ZIP Contents Description

SUBMISSION_MANIFEST.json
capability_bi_apis/prediction/chicago_crime.json           (79 records)
capability_dashboard_apis/prediction/chicago_crime.json    (78 records)
capability_multihop_reasoning/prediction/chicago_crime.json (45 records)
capability_multiturn/prediction/chicago_crime.json         (21 records)

### Validation Checklist

- [x] JSON files are valid and well-formed
- [x] ZIP file is accessible via the provided link
- [x] No sensitive or PII data included
- [x] Agent has been tested locally

### Additional Notes

The standout result is **Cap 4 (multi-turn, double-weighted): 0 errors, avg 45s**. With HELIX slice execution, each LLM call completes in 8–11s — giving the agent a 54+ iteration budget within the 600s limit instead of ~20.

The overall 40.5s average across all 223 questions on a CPU-only pod (no GPU) running a 32B parameter model demonstrates that HELIX sparse activation can deliver practical agentic inference speeds competitive with GPU-accelerated deployments. The two-pass tool isolation further reduces agent iteration counts for the 174-tool capability categories, keeping average durations below 50s even for the most complex multi-hop reasoning chains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark Output Submission]: Verificate HELIX V1.4 #9

Agent Name

Maintainer

Model(s) Used

Agent Description

Metadata (JSON)

ZIP File Link

ZIP Contents Description

Validation Checklist

Additional Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Benchmark Output Submission]: Verificate HELIX V1.4 #9

Description

Agent Name

Maintainer

Model(s) Used

Agent Description

Metadata (JSON)

ZIP File Link

ZIP Contents Description

Validation Checklist

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions