Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/forge/actors/reference_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,11 @@ def __post_init__(self):
@endpoint
async def setup(self):
engine_config = {f.name: getattr(self, f.name) for f in fields(self)}
self.engine = ForgeEngine(ForgeJobConfig(**engine_config))
engine_config = ForgeJobConfig(**engine_config)
engine_config.checkpoint.folder = (
"" # hardcode to empty to force load from initial_load_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does ref model get the initial weights after the change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So maybe add a validator on the ref model's config and only allow fields that differs per model. for example, Also put enable: true and initial_load_in_hf: true in the override together with the folder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a todo to make this configurable in the future? Sometimes you do want to update your reference model just very infrequently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JenniferWang that would have to be on the titan side right? Also, I think this module will be reusable for any "forward pass" model in the near future and not just for reference model. So I don't know if it's worth doing anything more than this now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this not just be accomplished by setting this in the config?

  checkpoint:
    enable: true
    initial_load_path: hf://${model}
    folder: "" 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenwang28 Yes, it is the same. But I thought it would be a little confusing to the users.
@pbontrager If user want to change the reference model, they can just update the initial_load_path in the config. Does it resolve some of your concern?

)
self.engine = ForgeEngine(engine_config)
self.engine.checkpointer.load()
self.model = self.engine.model_parts[0] # No pipeline parallelism yet
self.model.eval()
Expand Down
Loading