Commit 7cc5d15
authored
Add notebook demonstrating end-to-end interoperability between third-party agent frameworks and the NeMo Agent toolkit evaluation harness (#1799)
Adds `eval_harbor_atif_interop.ipynb` — a notebook demonstrating end-to-end interoperability between third-party agent frameworks and the NeMo Agent toolkit evaluation harness using ATIF as the interchange format.
The notebook uses real ATIF trajectories generated by [Harbor](https://github.com/harbor-framework/harbor) running `mini-swe-agent` on BFCL (Berkeley Function-Calling Leaderboard) tasks with `nvidia_nim/meta/llama-3.3-70b-instruct`, then evaluates them through `nvidia-nat-eval` using both custom and RAGAS evaluators.
Closes
## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
## Summary by CodeRabbit
* **New Features**
* Added a Jupyter notebook demonstrating end-to-end interoperability between Harbor ATIF trajectories and the NeMo evaluation flow, including utilities to extract agent BFCL outputs, two local evaluators (function-call validation and trajectory efficiency), per-sample scoring and detailed reasoning, and an optional LLM-based evaluation path gated by an API key.
* **Documentation**
* Guided notebook walkthrough covering trajectory loading, sample construction, running evaluators, and comparing Harbor verifier results.
Authors:
- Yuchen Zhang (https://github.com/yczhang-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: #17991 parent f51c41c commit 7cc5d15
1 file changed
+577
-0
lines changed
0 commit comments