Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@kwen2501
Copy link
Contributor

Change 1

Add argparser to accept model name and other arguments (e.g. PP degree).

$ torchrun dist_run.py -h
usage: dist_run.py [-h] [--pp PP] {llama2-7b-chat,llama3}

positional arguments:
  {llama2-7b-chat,llama3}
                        Name of the model to load

options:
  -h, --help            show this help message and exit
  --pp PP               Pipeline parallel degree

Supported model names:

llama2-7b-chat
llama3

(Following model aliases in models.json)

Change 2

Renamed variable hf_model_name to distribution.

A distribution looks like this:
meta-llama/Llama-2-7b-chat-hf
meta-llama/Meta-Llama-3-8B-Instruct

Change 3

Fix output undefined error when PP = 1.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1148

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 980d1ff with merge base 26c1d8b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 15, 2024
@kwen2501 kwen2501 requested a review from lessw2020 September 15, 2024 01:04
# Distribute model on TP mesh
model.distribute(tp_mesh)
logger.info(f"Model: {model}")
if rank == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: already filtering based on ranks in the dist_run.sh launcher rather than having express rank based logging in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a hard choice.. I don't know if regular users use filter (I never remember that option's name) and want to have a minimal command line work.

Copy link
Contributor

@lessw2020 lessw2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!
I also have toml config supporting code in distributed/config_manager.py that I had integrated earlier, so we should add in both cmd line and toml support ultimately (similar to titan where you can use either).

@Jack-Khuu
Copy link
Contributor

Rebasing on main to pick up CI fix

@lessw2020 lessw2020 merged commit 7a4f0d1 into main Sep 15, 2024
51 checks passed
@kwen2501
Copy link
Contributor Author

Rebasing on main to pick up CI fix

Thanks @Jack-Khuu !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants