Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
84abe2c
fix: assert no duplicate starting bos (#835)
ZhiyuLi-Nvidia Aug 4, 2025
fa21d3c
fix: Added sequence packing keys to SFT and GRPO recipes (#805)
ahmadki Aug 4, 2025
5604024
fix: OOM with some GRPO configs (#709)
ahmadki Aug 4, 2025
9389dfb
chore: upgrade vllm to v0.10.0 (#766)
yuki-97 Aug 5, 2025
c249efc
docs: fix checkpointing command for megatron->hf export (#823)
abdalgader-a Aug 5, 2025
c784dd9
feat: add data shuffle and random seed option (#334)
ZhiyuLi-Nvidia Aug 5, 2025
b74c5d0
feat: save checkpoint before timeout to avoid 4-hour runtime limit (#…
wedu-nvidia Aug 5, 2025
b6269f7
feat: track policy training compute throughput (#632)
ybgao-nvidia Aug 5, 2025
9af0a52
fix: fix grpo + mcore checkpointing without validation (#844)
ashors1 Aug 6, 2025
03472a0
feat: dockerfile can build hermetically or from build context (#799)
terrykong Aug 6, 2025
0557402
chore: 0.3.0 -> 0.4.0rc0 (#840)
terrykong Aug 6, 2025
233cc07
fix: force use of eager (disabled cuda graphs) due to convergence iss…
parthchadha Aug 6, 2025
0988a7d
fix: Fix error message in VllmGenerationWorker. (#633)
ffrujeri Aug 7, 2025
5910abb
feat: support DTensor CP in DPO and SFT (#798)
ashors1 Aug 7, 2025
b8a89a9
feat: support non-colocated in mcore (#613)
yuki-97 Aug 8, 2025
88a399e
chore: remove old fsdp1 unit test (#871)
yuki-97 Aug 8, 2025
bbbb3d6
fix: fix non-colocated with cpu_offload enabled (#861)
yuki-97 Aug 8, 2025
e924d33
docs: Link uv's installation instructions to uv's website (#837)
wangshangsam Aug 8, 2025
d73c942
feat: qwen3 export to HF (#873)
ashors1 Aug 8, 2025
d45ff3f
test: add deepscaler tests + pipe-clean configs + fix eval for deepsc…
terrykong Aug 8, 2025
fecf71e
fix: remove tie weight check (#700)
RayenTian Aug 8, 2025
2b87def
fix: OOM in deepscaler1.5b with sequence length = 16/24k (#875)
soodoshll Aug 8, 2025
8fd8c96
feat: Fix and enhances for Nsight system profiling (#865)
guyueh1 Aug 11, 2025
18b9e2c
test: lower step count on gemma nightly test to finish within 4 hours…
terrykong Aug 11, 2025
223bfa8
feat: add nemotron5 sharding (#481)
gshennvm Aug 12, 2025
e1f56c4
feat: add diagnostic script for problematic embeddings (#896)
terrykong Aug 12, 2025
9f7825e
feat: Add TP to embed_tokens and lm_head for Gemma models (#879)
RayenTian Aug 14, 2025
83c6bfc
refactor: split sync/async vllm worker ([1/2] of refactor vllm worker…
yuki-97 Aug 14, 2025
df31c1b
feat: chunked logprob calculation with deferred fp32 cast to help wit…
pjin-nvidia Aug 15, 2025
70b9666
build: Add Dockerfile that uses NGC pytorch image (#897)
chtruong814 Aug 18, 2025
b721703
build: Fix pytorch image ref in Dockerfile.ngc_pytorch (#936)
chtruong814 Aug 18, 2025
d149a62
test: enable 8k/16k/24k deepscaler nightly tests (#934)
terrykong Aug 19, 2025
eb50202
feat: GRPO + SFT Dtensor support for multimodal training (#712)
rohitrango Aug 19, 2025
bde1a68
feat: Add recipe to reproduce Tulu-3 DPO model (#804)
mrm-196 Aug 20, 2025
f85a4f6
ci: Fix docker build context (#942)
chtruong814 Aug 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Adding to .gitignore helps reduce the size of your working_dir

.git
# Note: removing .git from .dockerignore since it is valuable to have the git history to
# know where this container was built
# .git
*.out
*.log
*.tar
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,15 @@ jobs:
build-container:
if: ${{ needs.pre-flight.outputs.test_level != 'none' }}
needs: [pre-flight]
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_container.yml@v0.30.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_container.yml@v0.52.0
with:
build-ref: ${{ github.sha }}
image-name: nemo_rl_container
dockerfile: docker/Dockerfile
image-label: nemo-rl
target: hermetic
build-contexts: |
nemo-rl=${{ github.run_id }}/
build-args: |
MAX_JOBS=32
NEMO_RL_COMMIT=${{ github.sha }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ hf_datasets_cache/
datasets/
docker/*
!docker/Dockerfile
!docker/Dockerfile.ngc_pytorch
!docker/README.md
wandb/
checkpoints/
Expand Down
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[submodule "3rdparty/NeMo"]
path = 3rdparty/NeMo-workspace/NeMo
url = https://github.com/NVIDIA/NeMo.git
branch = zhiyul/yukih/prepare-refit-info
branch = pjin/ashors/rl-qwen3-export
shallow = true
[submodule "3rdparty/Megatron-LM"]
path = 3rdparty/Megatron-LM-workspace/Megatron-LM
Expand Down
19 changes: 12 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,9 @@ repos:
rev: v4.4.0
hooks:
- id: end-of-file-fixer
# only include python files
files: \.py$
types_or: [python, pyi] # Only include Python files.
- id: trailing-whitespace
# only include python files
files: \.py$
types_or: [python, pyi] # Only include Python files.

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: "v0.9.9" # Use the appropriate version
Expand Down Expand Up @@ -36,8 +34,15 @@ repos:
exclude: '^\.github/'
types: [file]

- repo: https://github.com/facebook/pyrefly
rev: 0.24.2
- repo: local
hooks:
- id: pyrefly-typecheck
files: \.py$
name: pyrefly check
entry: uv run --group dev pyrefly check
types_or: [python, pyi]
language: system
pass_filenames: false # Pyrefly reads config & project roots itself.
args: []
require_serial: true
additional_dependencies: []
minimum_pre_commit_version: "2.9.2"
2 changes: 1 addition & 1 deletion 3rdparty/NeMo-workspace/NeMo
Submodule NeMo updated from 8ddf43 to 5c4264
58 changes: 27 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,41 +105,37 @@ sudo apt-get update
sudo apt-get install cudnn-cuda-12
```

Install `uv`.
```sh
# For faster setup and environment isolation, we use `uv`
pip install uv
For faster setup and environment isolation, we use [uv](https://docs.astral.sh/uv/).
Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install uv.

# Initialize NeMo RL project virtual environment
# NOTE: Please do not use -p/--python and instead allow uv venv to read it from .python-version
# This ensures that the version of python used is always what we prescribe.
Then, initialize NeMo RL project virtual environment via:
```sh
uv venv
```
> [!NOTE]
> Please do not use `-p/--python` and instead allow `uv venv` to read it from `.python-version`.
> This ensures that the version of python used is always what we prescribe.

# If working outside a container, it can help to build flash-attn and warm the
# uv cache before your first run. The NeMo RL Dockerfile will warm the uv cache
# with flash-attn. See https://docs.nvidia.com/nemo/rl/latest/docker.html for
# instructions if you are looking for the NeMo RL container.
If working outside a container, it can help to build [flash-attn](https://github.com/Dao-AILab/flash-attention) and warm the uv cache before your first run.
```sh
bash tools/build-flash-attn-in-uv-cache.sh
# If sucessful, you should see "✅ flash-attn successfully added to uv cache"

# If you cannot install at the system level, you can install for your user with
# pip install --user uv

# Use `uv run` to launch all commands. It handles pip installing implicitly and
# ensures your environment is up to date with our lock file.

# Note that it is not recommended to activate the venv and instead use `uv run` since
# it ensures consistent environment usage across different shells and sessions.
# Example: uv run python examples/run_grpo_math.py
```
> [!NOTE]
> On the first install, `flash-attn` can take a while to install (~45min with 48 CPU hyperthreads). After it is built once, it is cached in your uv's cache dir making subsequent installs much quicker.

> [!TIP]
> The NeMo RL Dockerfile will warm the uv cache with flash-attn.
> See https://docs.nvidia.com/nemo/rl/latest/docker.html for instructions if you are looking for the NeMo RL container.

**Important Notes:**
If sucessful, you should see `✅ flash-attn successfully added to uv cache`.

- Use the `uv run <command>` to execute scripts within the managed environment. This helps maintain consistency across different shells and sessions.
- Ensure you have the necessary CUDA drivers and PyTorch installed compatible with your hardware.
- On the first install, `flash-attn` can take a while to install (~45min with 48 CPU hyperthreads). After it is built once, it is cached in your `uv`'s cache dir making subsequent installs much quicker.
- If you update your environment in `pyproject.toml`, it is necessary to force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run.
- **Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
Use `uv run` to launch all commands. It handles pip installing implicitly and ensures your environment is up to date with our lock file.
> [!NOTE]
> - It is not recommended to activate the `venv`, and you should use `uv run <command>` instead to execute scripts within the managed environment.
> This ensures consistent environment usage across different shells and sessions. Example: `uv run python examples/run_grpo_math.py`
> - Ensure you have the necessary CUDA drivers and PyTorch installed compatible with your hardware.
> - If you update your environment in `pyproject.toml`, it is necessary to force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run.
> - **Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.

## Training Backends

Expand Down Expand Up @@ -413,13 +409,13 @@ uv run python examples/converters/convert_dcp_to_hf.py \
--hf-ckpt-path results/grpo/hf
```

If you have a model saved in Megatron format, you can use the following command to convert it to Hugging Face format prior to running evaluation:
If you have a model saved in Megatron format, you can use the following command to convert it to Hugging Face format prior to running evaluation. This script requires mcore, so make sure to launch with the mcore extra:

```sh
# Example for a GRPO checkpoint at step 170
uv run python examples/converters/convert_megatron_to_hf.py \
uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
--config results/grpo/step_170/config.yaml \
--dcp-ckpt-path results/grpo/step_170/policy/weights/iter_0000000 \
--megatron-ckpt-path results/grpo/step_170/policy/weights/iter_0000000 \
--hf-ckpt-path results/grpo/hf
```

Expand Down
22 changes: 18 additions & 4 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
# Usage:
# Self-contained build (default: builds from main): docker buildx build -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
# Self-contained build (specific git ref): docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL): docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
# Local NeMo RL source override: docker buildx build --build-context nemo-rl=. -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .

ARG BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.05-cuda12.9-devel-ubuntu24.04
FROM scratch AS nemo-rl
ARG NRL_GIT_REF=main
ADD --keep-git-dir=true https://github.com/NVIDIA-NeMo/RL.git#${NRL_GIT_REF} /

FROM ${BASE_IMAGE} AS base

# It is more convenient for users to run as root
Expand Down Expand Up @@ -65,8 +75,8 @@ VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT uv pip install --link-mode symlink flash-att
EOF

# First copy only the dependency files
COPY pyproject.toml uv.lock ./
COPY --link 3rdparty/ ./3rdparty/
COPY --from=nemo-rl pyproject.toml uv.lock ./
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/

RUN <<"EOF" bash -exu
# uv sync has a more reliable resolver than simple uv pip install which can fail
Expand Down Expand Up @@ -100,7 +110,11 @@ LABEL com.nvidia.build.ref="${NVIDIA_BUILD_REF}"

ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

# Copy in source and prefetch all virtual environments
COPY . /opt/nemo-rl
# Copy in source from build context (defaults to cloned repo, can be overridden)
COPY --from=nemo-rl . /opt/nemo-rl
# Unshallow the repo to get the full history (in the case it was from the scratch layer).
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
# so do a quick check before trying to unshallow.
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true
RUN UV_LINK_MODE=symlink uv run nemo_rl/utils/prefetch_venvs.py

128 changes: 128 additions & 0 deletions docker/Dockerfile.ngc_pytorch
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# This Dockerfile is used to build a Docker image for NeMo RL with the NGC PyTorch base image.
# However, it is still a work in progress and is not yet ready for production use.
#
# Usage:
# Self-contained build (default: builds from main): docker buildx build -f docker/Dockerfile.ngc_pytorch --tag <registry>/nemo-rl:latest --push .
# Self-contained build (specific git ref): docker buildx build -f docker/Dockerfile.ngc_pytorch --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL): docker buildx build -f docker/Dockerfile.ngc_pytorch --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
# Local NeMo RL source override: docker buildx build --build-context nemo-rl=. -f docker/Dockerfile.ngc_pytorch --tag <registry>/nemo-rl:latest --push .
#
# If installing new dependencies in the container, then use "uv pip install new-dependency"
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:25.06-py3
FROM scratch AS nemo-rl
ARG NRL_GIT_REF=main
ADD --keep-git-dir=true https://github.com/NVIDIA-NeMo/RL.git#${NRL_GIT_REF} /

FROM ${BASE_IMAGE} AS base

# It is more convenient for users to run as root
USER root

RUN <<"EOF" bash -exu -o pipefail
export DEBIAN_FRONTEND=noninteractive
export TZ=America/Los_Angeles

apt-get update
apt-get install -y --no-install-recommends \
jq \
curl \
git \
rsync \
wget \
less \
vim \


apt-get clean
rm -rf /var/lib/apt/lists/*
EOF

# Install uv at /usr/local/bin in case the root home directory is bind mounted
ARG UV_VERSION=0.7.2
RUN curl -LsSf https://astral.sh/uv/${UV_VERSION}/install.sh | XDG_BIN_HOME=/usr/local/bin sh

# Disable usage stats by default for users who are sensitive to sharing usage.
# Users are encouraged to enable if they wish.
ENV RAY_USAGE_STATS_ENABLED=0
ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

# Build vLLM from source to use with the NVIDIA PyTorch base image
FROM base AS build_vllm

ARG MAX_JOBS=32
WORKDIR /opt
COPY --from=nemo-rl uv.lock /tmp/uv.lock

RUN <<"EOF" bash -exu
echo "Building vLLM from source for PyTorch base image"
VLLM_VERSION=$(grep -A 1 'name = "vllm"' /tmp/uv.lock | grep 'version =' | sed 's/version = "\(.*\)"/\1/') && \
echo "Building vLLM version: $VLLM_VERSION"
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout v$VLLM_VERSION
python use_existing_torch.py
pip install -r requirements/build.txt
pip wheel --no-deps --no-build-isolation -v .
EOF

FROM base AS hermetic

WORKDIR /opt/nemo-rl

# Variables to control the build of TE. If there are issues with parallelization, consider
# setting these to 1.
ARG MAX_JOBS
ARG NVTE_BUILD_THREADS_PER_JOB

ENV UV_PROJECT_ENVIRONMENT=/opt/nemo_rl_venv
ENV UV_CACHE_DIR=/opt/uv_cache
ENV UV_LINK_MODE=copy

# Define the no-install-package arguments for PyTorch base images
ARG BASE_IMAGE
ARG UV_NO_INSTALL_PACKAGES="--no-install-package torch --no-install-package torchvision --no-install-package triton --no-install-package nvidia-cublas-cu12 --no-install-package nvidia-cuda-cupti-cu12 --no-install-package nvidia-cuda-nvrtc-cu12 --no-install-package nvidia-cuda-runtime-cu12 --no-install-package nvidia-cudnn-cu12 --no-install-package nvidia-cufft-cu12 --no-install-package nvidia-cufile-cu12 --no-install-package nvidia-curand-cu12 --no-install-package nvidia-cusolver-cu12 --no-install-package nvidia-cusparse-cu12 --no-install-package nvidia-cusparselt-cu12 --no-install-package nvidia-nccl-cu12 --no-install-package vllm --no-install-package flash-attn --no-install-package transformer-engine --no-install-package transformer-engine-cu12 --no-install-package transformer-engine-torch --no-install-package numpy"
ENV UV_NO_INSTALL_PACKAGES=${UV_NO_INSTALL_PACKAGES}
ENV PATH="/opt/nemo_rl_venv/bin:$PATH"

# First copy only the dependency files
COPY --from=nemo-rl pyproject.toml uv.lock ./
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/


RUN --mount=type=bind,from=build_vllm,source=/opt/,target=/tmp/build_vllm/ <<"EOF" bash -exu

# uv sync has a more reliable resolver than simple uv pip install which can fail
# The venv is symlinked to avoid bloating the layer size
uv venv --system-site-packages ${UV_PROJECT_ENVIRONMENT}
uv pip install --no-cache-dir --no-deps /tmp/build_vllm/vllm/vllm*.whl
uv sync --link-mode symlink --locked --inexact --extra vllm --extra mcore --extra automodel --all-groups --no-install-project $UV_NO_INSTALL_PACKAGES
EOF

ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

WORKDIR /opt/nemo-rl

FROM hermetic AS release

ARG NEMO_RL_COMMIT
ARG NVIDIA_BUILD_ID
ARG NVIDIA_BUILD_REF
ENV UV_NO_SYNC=1
ENV NEMO_RL_COMMIT=${NEMO_RL_COMMIT:-<unknown>}
ENV NVIDIA_BUILD_ID=${NVIDIA_BUILD_ID:-<unknown>}
ENV NVIDIA_BUILD_REF=${NVIDIA_BUILD_REF:-<unknown>}
ENV NEMO_RL_PY_EXECUTABLES_SYSTEM=1
# The 25.06 Pytorch container is not compatible with vllm standalone compile so we disable it
ENV VLLM_USE_STANDALONE_COMPILE=0
LABEL com.nvidia.build.id="${NVIDIA_BUILD_ID}"
LABEL com.nvidia.build.ref="${NVIDIA_BUILD_REF}"

ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

# Copy in source from build context (defaults to cloned repo, can be overridden)
COPY --from=nemo-rl . /opt/nemo-rl
# Unshallow the repo to get the full history (in the case it was from the scratch layer).
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
# so do a quick check before trying to unshallow.
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true
RUN UV_LINK_MODE=symlink uv sync --locked --inexact $UV_NO_INSTALL_PACKAGES
4 changes: 2 additions & 2 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ NOTE: *We use `docker buildx` instead of `docker build` for these containers*

This directory contains the `Dockerfile` for NeMo-RL Docker images.
You can build two types of images:
- A **base image**: A minimal image where Python dependencies can be specified at runtime.
- A **hermetic image**: An image that includes default dependencies for offline use.
- A **release image** (recommended): Contains everything from the hermetic image, plus the nemo-rl source code and pre-fetched virtual environments for isolated workers.
- A **hermetic image**: Includes the base image plus pre-fetched NeMo RL python packages in the `uv` cache.


For detailed instructions on building these images, please see [docs/docker.md](../docs/docker.md).
39 changes: 39 additions & 0 deletions docs/adding-new-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,42 @@ uv run --extra vllm tools/model_diagnostics/2.long_generation_decode_vs_prefill.
# ...
# [Qwen/Qwen2.5-1.5B] ALL GOOD!
```

## [3.check_hf_model_embeddings_untrained.py](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/3.check_hf_model_embeddings_untrained.py)

Detects untrained or improperly initialized Hugging Face model embeddings by scanning for near-zero rows and rows with near-identical values in both input and output embeddings. The script also reports whether word embeddings are tied and summarizes basic statistics.

```sh
# Example run
uv run --extra mcore tools/model_diagnostics/3.check_hf_model_embeddings_untrained.py --model nvidia/Nemotron-H-8B-Base-8K

# ....
#================================================================================
#EMBEDDING SUMMARIES
#================================================================================
#
#--- Input Embeddings Summary ---
#Shape: torch.Size([131072, 4096]), Dtype: torch.bfloat16
#Near-zero embeddings (abs < 1.00e-10): 1039/131072 (0.8%)
# Indices: 0-1,3-999,1192-1193,1245-1255,55014,77579,81772,81819,82312,82500,82725,82737,82977,84020,84121,84521,84794,85015,86409,87411,89412,90320,91368,94485,96385,104097,108262,112147,112327,112497,114755
#Identical embeddings (std < 1.00e-08): 1041/131072 (0.8%)
# Indices: 0-1,3-999,1192-1193,1245-1255,55014,77579,81772,81819,82312,82500,82725,82737,82977,83855,84020,84121,84521,84794,85015,86409,87411,89412,90320,91368,94485,96385,101707,104097,108262,112147,112327,112497,114755
#Statistics: mean_abs=0.007874, max_abs=0.196289, std_range=[0.000000, 0.015442]
#⚠️ POTENTIAL ISSUES: 1039 near-zero embeddings, 1041 identical embeddings
#
#--- Output Embeddings Summary (Tied: False) ---
#Shape: torch.Size([131072, 4096]), Dtype: torch.bfloat16
#Near-zero embeddings (abs < 1.00e-10): 0/131072 (0.0%)
#Identical embeddings (std < 1.00e-08): 0/131072 (0.0%)
#Statistics: mean_abs=0.006775, max_abs=0.200195, std_range=[0.004089, 0.021240]
#✅ No obvious untrained patterns detected
#
#=== Final Summary ===
#Model: nvidia/Nemotron-H-8B-Base-8K
#Analysis complete.
```

- Thresholds can be adjusted via flags:
- `--near-zero-threshold` (default: `1e-10`)
- `--identical-threshold` (default: `1e-8`)
- If any near-zero or identical rows are reported, the model may have issues of numerical instability (e.g., inf grad norms) during post-training if any of these problematic tokens are encountered. We have observed this happening when special tokens are reserved in the tokenizer and embedding, but none are encountered during pre-training. It may help to initialize these embeddings similar to how they were initialize during pre-training.
Loading
Loading