Skip to content

It only responds to 100 requests #638

@soramasa-k

Description

@soramasa-k

I want to get benchmark NVIDIA NIM(gpt-oss-120b)

`#!/bin/bash
set -e

MODEL="openai/gpt-oss-120b"
TOKENIZER="xxxx/.cache/nim/ngc/hub/models--nim--openai--gpt-oss-120b/snapshots/xxxx"
URL="http://localhost:8000/v1"

CONCURRENCIES=(100)

INPUT_OUTPUT_PAIRS=(
"500 2500"
"500 10000"
"10000 2500"
"10000 5000"
"10000 10000"
)

BASE_OUTDIR="artifacts_gpt_oss_120b"

mkdir -p "${BASE_OUTDIR}"

for CONCURRENCY in "${CONCURRENCIES[@]}"; do
REQUEST_COUNT=$((CONCURRENCY * 10))

for PAIR in "${INPUT_OUTPUT_PAIRS[@]}"; do
read INPUT_TOKENS OUTPUT_TOKENS <<< "${PAIR}"

RUN_NAME="c${CONCURRENCY}_in${INPUT_TOKENS}_out${OUTPUT_TOKENS}"
OUTDIR="${BASE_OUTDIR}/${RUN_NAME}"
PREFIX="profile_${RUN_NAME}"

echo "========================================"
echo "Running benchmark: ${RUN_NAME}"
echo "Concurrency     : ${CONCURRENCY}"
echo "Request count   : ${REQUEST_COUNT}"
echo "========================================"

mkdir -p "${OUTDIR}"

aiperf profile \
  --model "${MODEL}" \
  --tokenizer "${TOKENIZER}" \
  --url "${URL}" \
  --endpoint-type chat \
  --streaming \
  --concurrency "${CONCURRENCY}" \
  --request-rate 200 \
  --prompt-input-tokens-mean "${INPUT_TOKENS}" \
  --prompt-output-tokens-mean "${OUTPUT_TOKENS}" \
  --request-count "${REQUEST_COUNT}" \
  --warmup-request-count 1 \
  --output-artifact-dir "${OUTDIR}" \
  --profile-export-prefix "${PREFIX}"

done
done

echo "All benchmarks completed."
`
Most of the requests are failing; do you know what the cause might be?
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions