-
Notifications
You must be signed in to change notification settings - Fork 391
Closed
Description
Description
The build_conda.sh script contains duplicate Megatron-LM cloning and installation steps, which causes unnecessary redundancy and incorrect directory structure during the build process.
Current Behavior
The script clones and installs Megatron-LM twice:
- First installation (lines 49-51):
git clone https://github.com/NVIDIA/Megatron-LM.git --recursive && \ cd Megatron-LM && git checkout ${MEGATRON_COMMIT} && \ pip install -e .
- Location: $BASE_DIR/sglang/Megatron-LM (incorrect, since the script is still in the sglang directory from line 29)
- Uses undefined variable ${MEGATRON_COMMIT} (not set anywhere in the script)
- Second installation (lines 58-61):
cd $BASE_DIR
git clone https://github.com/NVIDIA/Megatron-LM.git --recursive &&
cd Megatron-LM/ && git checkout core_v0.14.0 &&
pip install -e .
- Location: $BASE_DIR/Megatron-LM (correct)
- Explicitly uses core_v0.14.0 version
The patch application at line 84 uses $BASE_DIR/Megatron-LM, which relies on the second installation.
Expected Behavior
Megatron-LM should be cloned and installed only once at the correct location with the proper version specified.
Comparison with Dockerfile
The docker/Dockerfile handles this correctly:
- Defines MEGATRON_COMMIT as an ARG (line 7)
- Clones Megatron-LM only once (lines 52-54)
- Has proper error handling for patch application
I'm uncertain about this. If this counts as a bug, I will fix it and submit a pull request.
Metadata
Metadata
Assignees
Labels
No labels