Skip to content

Commit 7cc5d15

Browse files
authored
Add notebook demonstrating end-to-end interoperability between third-party agent frameworks and the NeMo Agent toolkit evaluation harness (#1799)
Adds `eval_harbor_atif_interop.ipynb` — a notebook demonstrating end-to-end interoperability between third-party agent frameworks and the NeMo Agent toolkit evaluation harness using ATIF as the interchange format. The notebook uses real ATIF trajectories generated by [Harbor](https://github.com/harbor-framework/harbor) running `mini-swe-agent` on BFCL (Berkeley Function-Calling Leaderboard) tasks with `nvidia_nim/meta/llama-3.3-70b-instruct`, then evaluates them through `nvidia-nat-eval` using both custom and RAGAS evaluators. Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added a Jupyter notebook demonstrating end-to-end interoperability between Harbor ATIF trajectories and the NeMo evaluation flow, including utilities to extract agent BFCL outputs, two local evaluators (function-call validation and trajectory efficiency), per-sample scoring and detailed reasoning, and an optional LLM-based evaluation path gated by an API key. * **Documentation** * Guided notebook walkthrough covering trajectory loading, sample construction, running evaluators, and comparing Harbor verifier results. Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1799
1 parent f51c41c commit 7cc5d15

File tree

1 file changed

+577
-0
lines changed

1 file changed

+577
-0
lines changed

0 commit comments

Comments
 (0)