[1/n llava]unify model construction ppl #1153

Gasoonjia · 2024-09-16T23:52:56Z

This PR adopts the same pipeline to construct both chat model and tune model; previously we used TransformerArgs to construct chat-backend model while dictionary for tune-backend model.
Also fix some annoying hacky stuff for configuration.

pytorch-bot · 2024-09-16T23:53:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1153

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

High MacOS queue

✅ You can merge normally! (2 Unrelated Failures)

As of commit f224da7 with merge base 16b3d64 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-mps / macos-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-mps-dtype / macos-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchchat/model.py

torchchat/cli/builder.py

Jack-Khuu · 2024-09-17T02:05:04Z

distributed/parallelize_llama.py

    # when applying TP. We need to have change to ensure KVCache has the correct
    # size as k and v.
-    model.config.transformer_args["text"].n_local_heads = model.config.transformer_args["text"].n_local_heads // tp_mesh.size()
+    model.model.config.n_local_heads = model.model.config.n_local_heads // tp_mesh.size()


model.model is really hard to reason about... what type is it?

The former was clunky, but legible. I'm not sure about this

I'm not happy with "text" either it was not sustainable, especially if the number of modules increases.
It needs fixing, but model.model might not be perfectly there yet, but it's close

That's annoying, i'm 100% agree.
I will remove model.model as soon as I can.

Jack-Khuu · 2024-09-17T04:02:45Z

torchchat/model.py


 class Transformer(nn.Module):
-    def __init__(self, config: TransformerArgs) -> None:
+    def __init__(self, config: Dict[str, Any]) -> None:


Not a fan of this one, Transformer taking TransformerArgs is the most intuitive set up and matches the other classes

SG. Bring it back

…lity

Jack-Khuu · 2024-09-17T16:37:21Z

torchchat/generate.py

-            )
-        elif generator_args.chat_mode:
-            if (
-                max_seq_length := self.model.config.transformer_args.get("text", None)


Your changes are right; just calling out that the old implementation was broken in 26c1d8b

Jack-Khuu · 2024-09-17T16:42:19Z

torchchat/generate.py

-                    if text_transformer_args is not None
-                    else 2048
-                ),
+                encoded.size(0) + generator_args.max_new_tokens, max_seq_length


Note that this is a departure from the original code where the second argument to min is block_size (which represents a different max_seq_length (confusing i know)).

While we want to move away from using the block_size, let's not do it in this diff

good catch! not sure why this happen, probaly a typo. Will fix it.

torchchat/model.py

Jack-Khuu · 2024-09-17T16:50:00Z

torchchat/model.py

+        model_type (ModelType): The type of the model. This attribute is used to categorize the model into different classes.
+        transformer_args (Dict[str, Dict[str, Any]]): A dictionary containing the parameters for each transformer in the model.
+            The outer dictionary has transformer names as keys and inner dictionaries as values. Each inner dictionary contains
+            the parameter names and their corresponding values for the respective transformer.


Each inner dictionary contains the parameter names and their corresponding values for the respective transformer.

This sounds like the intent of transformer args; why can't we use that instead of Dictp[str, Any]

for unification. this arg takes charge for describing architecture for all models, including tune-backends, chat-backends, and even mix-backends. so we need a unify way to descible how we will set up them.
for chat-backend modules, the inner Dict will be converted into tranformerArg afterwards.

Jack-Khuu · 2024-09-17T17:09:45Z

torchchat/model.py

    def __init__(
        self,
-        transformer_args: Union[TransformerArgs, Dict[str, TransformerArgs]],
+        transformer_args: Dict[str, Dict[str, Any]],


We should find a way to reconcile Dict[str, Any] into a TransformerArgs in a future PR

This makes this work well since we have 3 "cases", but storing/passing around an untyped Dict makes me nervous

More than agree. My mental model would be creating an abstract class containig essential apis for all module configurations, and for different transformer (e.g. ours, tunes, etc) we have a different implementation. Dict[str, Any] is not a great way.
Let me add some comments in our codebase to highlight that.

Jack-Khuu · 2024-09-17T17:20:50Z

torchchat/model.py

        super().__init__()
        self.config = config
        self.model = self.build_model()
+        self.text_transformer_args = None


Comment on this since it is a special case

unify model construction ppl

fff8647

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 16, 2024

Gasoonjia added 5 commits September 16, 2024 16:59

update transformer config

4b666a7

update model config for gguf

cc8b4d6

hack PTEModel to have same config hirearchy as Model

7ec018a

unify model construction ppl

94e56f1

Merge branch 'main' into unify-constuct-model

2e3d1dc

Gasoonjia requested review from Jack-Khuu and vmpuri September 17, 2024 00:41

Gasoonjia added 2 commits September 16, 2024 18:04

hack PTEModel to support current ppl

63d76a1

fix a typo

01bb624

Jack-Khuu reviewed Sep 17, 2024

View reviewed changes

unify model construction ppl

319ac86

Jack-Khuu reviewed Sep 17, 2024

View reviewed changes

Gasoonjia added 5 commits September 16, 2024 23:38

rebase and solve comments

141fea0

bring TransformerArgs back to Transformer

8cd0936

rename get_text_transformer_args as text_transformer_args for readibi…

304fece

…lity

make text_transformer_args a real attribute

1eff939

get rid of model.model

a356897

Gasoonjia changed the title ~~unify model construction ppl~~ [1/n llava]unify model construction ppl Sep 17, 2024

Jack-Khuu approved these changes Sep 17, 2024

View reviewed changes

solve comments

f224da7

Gasoonjia merged commit b0c933c into main Sep 17, 2024
49 of 51 checks passed

kwen2501 mentioned this pull request Sep 18, 2024

[Distributed] Follow upstream TransformerArgs changes #1161

Merged

[1/n llava]unify model construction ppl #1153

[1/n llava]unify model construction ppl #1153

Uh oh!

Conversation

Gasoonjia commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1153

❗ 1 Active SEVs

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Gasoonjia commented Sep 16, 2024 •

edited

Loading

pytorch-bot bot commented Sep 16, 2024 •

edited

Loading

Jack-Khuu Sep 17, 2024 •

edited

Loading

Jack-Khuu Sep 17, 2024 •

edited

Loading

Jack-Khuu Sep 17, 2024 •

edited

Loading