Skip to content

[Bug] Duplicate Megatron-LM installation in build_conda.shΒ #1236

@kaysonyu

Description

@kaysonyu

Description

The build_conda.sh script contains duplicate Megatron-LM cloning and installation steps, which causes unnecessary redundancy and incorrect directory structure during the build process.

Current Behavior

The script clones and installs Megatron-LM twice:

  1. First installation (lines 49-51):
    git clone https://github.com/NVIDIA/Megatron-LM.git --recursive && \
        cd Megatron-LM && git checkout ${MEGATRON_COMMIT} && \
        pip install -e .
  • Location: $BASE_DIR/sglang/Megatron-LM (incorrect, since the script is still in the sglang directory from line 29)
  • Uses undefined variable ${MEGATRON_COMMIT} (not set anywhere in the script)
  1. Second installation (lines 58-61):
    cd $BASE_DIR
    git clone https://github.com/NVIDIA/Megatron-LM.git --recursive &&
    cd Megatron-LM/ && git checkout core_v0.14.0 &&
    pip install -e .
    - Location: $BASE_DIR/Megatron-LM (correct)
    - Explicitly uses core_v0.14.0 version

The patch application at line 84 uses $BASE_DIR/Megatron-LM, which relies on the second installation.

Expected Behavior

Megatron-LM should be cloned and installed only once at the correct location with the proper version specified.

Comparison with Dockerfile

The docker/Dockerfile handles this correctly:

  • Defines MEGATRON_COMMIT as an ARG (line 7)
  • Clones Megatron-LM only once (lines 52-54)
  • Has proper error handling for patch application
Image

I'm uncertain about this. If this counts as a bug, I will fix it and submit a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions