Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@Gasoonjia
Copy link
Contributor

@Gasoonjia Gasoonjia commented Sep 16, 2024

This PR adopts the same pipeline to construct both chat model and tune model; previously we used TransformerArgs to construct chat-backend model while dictionary for tune-backend model.
Also fix some annoying hacky stuff for configuration.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1153

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit f224da7 with merge base 16b3d64 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 16, 2024
# when applying TP. We need to have change to ensure KVCache has the correct
# size as k and v.
model.config.transformer_args["text"].n_local_heads = model.config.transformer_args["text"].n_local_heads // tp_mesh.size()
model.model.config.n_local_heads = model.model.config.n_local_heads // tp_mesh.size()
Copy link
Contributor

@Jack-Khuu Jack-Khuu Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.model is really hard to reason about... what type is it?

The former was clunky, but legible. I'm not sure about this

Copy link
Contributor

@Jack-Khuu Jack-Khuu Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with "text" either it was not sustainable, especially if the number of modules increases.
It needs fixing, but model.model might not be perfectly there yet, but it's close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying, i'm 100% agree.
I will remove model.model as soon as I can.


class Transformer(nn.Module):
def __init__(self, config: TransformerArgs) -> None:
def __init__(self, config: Dict[str, Any]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this one, Transformer taking TransformerArgs is the most intuitive set up and matches the other classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Bring it back

@Gasoonjia Gasoonjia changed the title unify model construction ppl [1/n llava]unify model construction ppl Sep 17, 2024
)
elif generator_args.chat_mode:
if (
max_seq_length := self.model.config.transformer_args.get("text", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes are right; just calling out that the old implementation was broken in 26c1d8b

if text_transformer_args is not None
else 2048
),
encoded.size(0) + generator_args.max_new_tokens, max_seq_length
Copy link
Contributor

@Jack-Khuu Jack-Khuu Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a departure from the original code where the second argument to min is block_size (which represents a different max_seq_length (confusing i know)).

While we want to move away from using the block_size, let's not do it in this diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! not sure why this happen, probaly a typo. Will fix it.

model_type (ModelType): The type of the model. This attribute is used to categorize the model into different classes.
transformer_args (Dict[str, Dict[str, Any]]): A dictionary containing the parameters for each transformer in the model.
The outer dictionary has transformer names as keys and inner dictionaries as values. Each inner dictionary contains
the parameter names and their corresponding values for the respective transformer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each inner dictionary contains the parameter names and their corresponding values for the respective transformer.

This sounds like the intent of transformer args; why can't we use that instead of Dictp[str, Any]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unification. this arg takes charge for describing architecture for all models, including tune-backends, chat-backends, and even mix-backends. so we need a unify way to descible how we will set up them.
for chat-backend modules, the inner Dict will be converted into tranformerArg afterwards.

def __init__(
self,
transformer_args: Union[TransformerArgs, Dict[str, TransformerArgs]],
transformer_args: Dict[str, Dict[str, Any]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a way to reconcile Dict[str, Any] into a TransformerArgs in a future PR

This makes this work well since we have 3 "cases", but storing/passing around an untyped Dict makes me nervous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than agree. My mental model would be creating an abstract class containig essential apis for all module configurations, and for different transformer (e.g. ours, tunes, etc) we have a different implementation. Dict[str, Any] is not a great way.
Let me add some comments in our codebase to highlight that.

super().__init__()
self.config = config
self.model = self.build_model()
self.text_transformer_args = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on this since it is a special case

@Gasoonjia Gasoonjia merged commit b0c933c into main Sep 17, 2024
49 of 51 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants