Skip to content

JSON Serialization Error: NaN Values Causing Serialization Failure #1646

@kelvin715

Description

@kelvin715

JSON Serialization Error: NaN Values Causing Serialization Failure

Problem Description

When using the Llama-3.1-8B-Instruct model for inference, a JSON serialization error occurs. The error happens when the /v1/chat/completions/tokens endpoint tries to return a response, failing to serialize ChatCompletionResponse to JSON.

Error Message

ValueError: Out of range float values are not JSON compliant: nan

Full Error Stack Trace

[0;36m(ApiServer_0 pid=467463)[0;0m ERROR:    Exception in ASGI application
[0;36m(ApiServer_0 pid=467463)[0;0m Traceback (most recent call last):
  ...
  File "/proj-vertical-llms-pvc/users/zhihan/webarena/WebAgent-R1/WebAgent-R1-prime/prime-rl/src/prime_rl/inference/vllm/server.py", line 119, in _chat_with_tokens
    return JSONResponse(content=generator.model_dump())
  ...
  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
[0;36m(ApiServer_0 pid=467463)[0;0m ValueError: Out of range float values are not JSON compliant: nan

Problem Analysis

  1. Error Location: The error occurs in the _chat_with_tokens function in src/prime_rl/inference/vllm/server.py when trying to convert a ChatCompletionResponse object to a dictionary via model_dump() and serialize it to JSON.

  2. Root Cause: The ChatCompletionResponse object contains nan (Not a Number) float values, which are not supported by the JSON standard (along with inf and other special float values).

  3. Model Differences:

    • Qwen3 Series Models: When using Qwen3-4B-Instruct-2507 and Qwen2.5-7B-Instruct, this error was not observed
    • Llama3.1 8B: After switching to Llama-3.1-8B-Instruct, this error started occurring

Configuration

Full configuration file content:

inference_gpu_ids = [0,1]
trainer_gpu_ids = [2,3,4,5,6,7]

max_steps = 200          # Number of training steps
seq_len = 16384           # Sequence length for training

[model]
name = "Llama-3.1-8B-Instruct"   

[wandb]
project = "webarena-rl"
name = "webarena-debug"

[trainer.model]
impl = "liger_kernel"    # Training implementation
ac = { freq = 1 }        # freq=1 means full activation checkpointing (checkpoint every layer)

[trainer.optim]
lr = 1e-5

[orchestrator]
batch_size = 128              # Batch size (adjust based on GPU memory)
rollouts_per_example = 16     # Number of rollouts per example

[orchestrator.optim]
lr = 1e-5                  # Learning rate (should match trainer.optim.lr)

[orchestrator.sampling]
max_tokens = 512            # Max tokens per generation

# WebArena environment configuration
[[orchestrator.env]]
id = "webarena-env"          # Environment ID (must match the installed package name)
name = "webarena-env"        # Display name for logs

[inference]
# Inference server configuration
api_server_count = 2         # Number of inference servers
gpu_memory_utilization = 0.9

[inference.parallel]
dp = 2  

[inference.model]

Questions

  1. Why does this problem occur?
  2. How to fix it?

Related Code Locations

  • src/prime_rl/inference/vllm/server.py:119 - _chat_with_tokens function
  • src/prime_rl/inference/vllm/serving_chat_with_tokens.py - Chat completion with tokens implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions