Releases: NVIDIA-NeMo/Evaluator
Releases · NVIDIA-NeMo/Evaluator
NVIDIA NeMo Evaluator Launcher 0.1.14
feat(interceptors): remove params from payload recursively (#300) Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
NVIDIA NeMo Evaluator 0.1.12
build(config-test): dry run yaml examples test (#292) 1. Add config dry run sanity checks --------- Signed-off-by: Anna Warno <awarno@nvidia.com>
NVIDIA NeMo Evaluator Launcher 0.1.13
build(config-test): dry run yaml examples test (#292) 1. Add config dry run sanity checks --------- Signed-off-by: Anna Warno <awarno@nvidia.com>
NVIDIA NeMo-Eval 0.1.0
- Evaluation for Automodel with vllm OAI deployment and nvidia-lm-eval as the eval harness
- Support for Logprob benchmarks with Ray
- Use evaluation APIs from nvidia-eval-commons
Known Issues
- Very low flexible-extract score with GSM8k for evaluation of NeMo 2.0 models due to lack of stop word support in MegatronLLMDeployableNemo2. However, this does not impact the strict-match score.
NVIDIA NeMo Evaluator Launcher 0.1.12
nemo-evaluator-launcher-v0.1.12 fix(local executor): use extra_docker_args instead of hard-coded --ne…
NVIDIA NeMo Evaluator 0.1.11
chore(docs) include NeMo FW in the README and populate index.md (#256) Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
NVIDIA NeMo Evaluator Launcher 0.1.11
chore(docs) include NeMo FW in the README and populate index.md (#256) Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
NVIDIA NeMo Evaluator 0.1.10
feat: add trtllm deployment config (#278) Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
NVIDIA NeMo Evaluator Launcher 0.1.10
feat: add trtllm deployment config (#278) Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
NVIDIA NeMo Evaluator 0.1.9
Awarno/fix adapter server readiness issue (#263) - **Increased** server wait time from **10 s → 300 s**. - **Removed** unused `_calculate_inference_time()` method from `ResponseStatsInterceptor`. - **Optimized startup** by loading a single aggregated snapshot (instant startup, fast) instead of loading all individual reasoning request stats and processing them to get the aggregated. --------- Signed-off-by: Anna Warno <awarno@nvidia.com>