-
Notifications
You must be signed in to change notification settings - Fork 77
Description
This issue might be capturing my own misconcptions about the current state of the code-base.
I'm opening the issue to get feedback from @jakelorocco and @frreiss. It's possible that I am misunderstanding things and all that's needed is better documentation. If that's the case, then we can probably iterate on this issue description to explain how things work (or should work).
If I state anything false or misleading in this issue, please quote and explain what I have wrong.
The Immediate Problem
While fixing issues related to our m tune command (see PR #422 ), I found that none of our custom alora stuff has worked since the intrinsics change.
(These examples aren't in the code-base's test suite because they require training and loading a custom model; we should find a way to mock that someone in the future.)
The cause for the break is as follows:
- all requirement checking adapters must be intrinsics, but
- all intrinsics must be listed in
_INTRINSICS_CATALOG_ENTRIES, and - there is no way to add custom stuff to
_INTRINSICS_CATALOG_ENTRIES(and even if there were that would be ugly).
The failure point is in the Intrinsic.__init__ call within the AloraRequirement initializer:
Intrinsic.__init__(
self,
intrinsic_name=intrinsic_name,
intrinsic_kwargs={"requirement": f"{self.description}"},
)which calls adapters.catalog.fetch_intrinsic_metadata(intrinsic_name: str) function, which does a lookup against the _INTRINSICS_CATALOG_ENTRIES table. This means that you cannot use any adapter for requirement checking that is not part of the _INTRINSICS_CATALOG_ENTRIES table. That's a problem because we might want custom lora/alora requirement checkers that are implemented as adapters (as in the canonical stembolt example).
I tried to hack around this by adding to the catalog, but that also failed because the adapter must be a GraniteCommonAdapter (huggingface.py:354), which opened another whole can of worms.
Larger Issue
After digging through the current Intrinsic code-base, I think this is a symptom of a larger problem. We have not done a proper job at documenting the architectural choices associated with intrinsics and adapters. We may have also made some bad decisions, but I'm not sure because I am still not quite sure how everything is supposed to work.
I'm capturing some other alora questions here, which aren't directly related to getting m tune working again.
- It's unclear to me how you're supposed to write the logic for activating a generic adapter without turning it into an intrinsic. Our current training code doesn't auto-generate the JSONs for eg activation sequences (trivial to fix), and I think you might also need to modify granite.common itself in order to use a custom adapter, at least for our current code base (less trivial to fix / does require some redesign). In any case, it should be possible to train an adapter and use it to implement a component's generation without bundling it as an intrinsic. Open to being
- We tout that the primary advantage of aloras is that you re-use kv cache, but then implement aloras in terms of chat rewrites. If you rewite prior history, then you have to recompute kv cache anyways. So why not use a LoRA?
- Our code currently allows the use of multiple aLoRAs against a single model. The original paper states that this is not possible. Have changed since then / do we have changes in the pipeline to enable this? If not, we should enforce the invariant that only one alora may be loaded / used with any particular backend. You must unload the old alora before loading the new one, and that triggers a cache invalidation.