Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/dev/intrinsics_and_adapters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Intrinsics and Adapters
Note: Mellea currently only supports GraniteCommonAdapters and Intrinsics.

## Basics
In Mellea, intrinsics are a type of Component that signals one or more of the following to a backend:
- a special adapter must be used for generation
- the input/output for generation must be transformed in a particular way
- the model options must be modified in a particular way

These changes only happen when the intrinsic is the "action" of the request. Intrinsics should usually not be used as an item in the context of generation (in fact, by default, Intrinsics have no string representation).

These changes are specified by the Adapter that corresponds to a given Intrinsic. Matching happens based on the adapter name and type.

## Parts of an Intrinsic
Intrinsics specify:
- an adapter name (ie requirement_check)
- types of adapters suitable to be used (ie alora)
- any kwargs necessary (ie a requirement like "make sure the last user message is...")

## Parts of an Adapter
Adapters specify:
- compatible backends
- adapter type
- functions for getting a path to load them

## Using Intrinsics
Mellea Intrinsics currently utilize the granite-common package for loading adapters and formatting input/outputs (https://github.com/ibm-granite/granite-common). This means Mellea only allows intrinsics/adapters that follow this pattern.

## Needed Future Work
### Custom Adapters / Intrinsics
Mellea should support custom intrinsic / adapter implementations. To do this:
- make backend `_generate_from_intrinsic` functions generic and utilize only common adapter functions
- adapters must specify a transformation function that encapsulates the input/output modifications necessary for their generation requests

### Concurrency Checks
Some backends (currently only LocalHFBackend) that allow adapters to be loaded, cannot independently utilize these adapters without impacting other generation requests.

These backends should support a generation lock that ensures requests are only performed when the correct set of adapters (or no adapters) are active.
9 changes: 5 additions & 4 deletions docs/dev/requirement_aLoRA_rerouting.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ The actual rule is slightly more complicated.

## The Actual Rule

If a `Requirement` is validated using a backend that could either use a `constraint` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `alora.generate_from_strings` method.
If a `Requirement` is validated using a backend that could either use a `requirement_check` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `backend._generate_from_intrinsic` method.

There are three exceptions to this rule:
1. `Backend.default_to_constraint_checking_alora` is set to `False` (this parameter defaults to `True`).
2. The `Requirement` has a more specific subtype that indicates a more specific intent (`LLMaJRequirement`).
3. The `ALoRA` requirement checker throws an exception.

There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `deault_to_constraint_checking_alora`.
There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `default_to_constraint_checking_alora`.

## Decision Rationale

Expand All @@ -33,12 +33,13 @@ Suppose that the user creates a backend and then adds a generic constraint check

```python
from mellea import start_session
from mellea.backends.aloras.granite_aloras import add_granite_aloras
from mellea.stdlib.requirement import Requirement

m = start_session(
"huggingface.LocalHFBackend:ibm-granite/granite-3.2-8b-instruct")
add_granite_aloras(m) # This will load the Constraint checint aLoRA.

# By default, the AloraRequirement uses a GraniteCommonAdapter with "requirement_check".
m.backend.add_adapter(GraniteCommonAdapter("requirement_check"))

m.instruct(
"Corporate wants you to find the difference between these two strings:\n\naaa\naba")
Expand Down
45 changes: 45 additions & 0 deletions docs/examples/intrinsics/intrinsics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from mellea.backends.openai import OpenAIBackend, _ServerType
from mellea.backends.adapters.adapter import AdapterType, GraniteCommonAdapter
from mellea.stdlib.base import ChatContext, ModelOutputThunk
from mellea.stdlib.chat import Message
import mellea.stdlib.funcs as mfuncs
from mellea.stdlib.intrinsics.intrinsic import Intrinsic

# Create the Adapter. GraniteCommonAdapter's default to ALORAs.
req_adapter = GraniteCommonAdapter("requirement_check")

# Create the backend. Assumes a locally running VLLM server.
backend = OpenAIBackend(
model_id="ibm-granite/granite-3.3-8b-instruct",
base_url="http://0.0.0.0:8000/v1",
api_key="EMPTY",
)

# If using a remote VLLM server, utilize the `test/backends/test_openai_vllm/serve.sh`
# script with `export VLLM_DOWNLOAD_RAG_INTRINSICS=True`. This will download the granite_common
# adapters on the server.
backend._server_type = _ServerType.REMOTE_VLLM

# Add the adapter to the backend.
backend.add_adapter(req_adapter)

ctx = ChatContext()
ctx = ctx.add(Message("user", "Hi, can you help me?"))
ctx = ctx.add(Message("assistant", "Hello; yes! What can I help with?"))

# Generate from an intrinsic with the same name as the adapter. By default, it will look for
# ALORA and then LORA adapters.
out, new_ctx = mfuncs.act(
Intrinsic(
"requirement_check",
intrinsic_kwargs={"requirement": "The assistant is helpful."},
),
ctx,
backend,
)

# Print the output. The requirement_check adapter has a specific output format:
print(out) # {"requirement_likelihood": 1.0}

# The AloraRequirement uses this adapter. It automatically parses that output
# when validating the output.
25 changes: 0 additions & 25 deletions mellea/backends/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from collections.abc import Callable
from typing import Any, Literal

from mellea.backends.aloras import Alora
from mellea.backends.formatter import Formatter
from mellea.backends.tools import parse_tools
from mellea.helpers.fancy_logger import FancyLogger
Expand Down Expand Up @@ -57,30 +56,6 @@ def to_chat(
return ctx_as_conversation


def use_alora(
action: Component | CBlock,
alora: Alora | None,
default_to_constraint_checking_alora: bool,
) -> bool:
"""Returns True when the condition for using alora is met.

See `docs/dev/requirement_aLoRA_rerouting.md` for an explanation of the following code block.
"""
if issubclass(type(action), Requirement):
# The general rule is that we reroute to the alora if it exists.
reroute_to_alora = alora is not None
# However, there are some exceptions:
if not default_to_constraint_checking_alora:
reroute_to_alora = False
if issubclass(type(action), LLMaJRequirement):
reroute_to_alora = False
if issubclass(type(action), ALoraRequirement):
reroute_to_alora = True
return reroute_to_alora
else:
return False


def to_tool_calls(
tools: dict[str, Callable], decoded_result: str
) -> dict[str, ModelToolCall] | None:
Expand Down
224 changes: 224 additions & 0 deletions mellea/backends/adapters/adapter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
"""Module for adapters to backends."""

import abc
import pathlib
from enum import Enum
from typing import Any, TypeVar

import granite_common
from litellm import cast

from mellea.backends import Backend
from mellea.backends.types import _ServerType


class AdapterType(Enum):
"""Possible types of adapters for a backend."""

LORA = "lora"
ALORA = "alora"


class Adapter(abc.ABC):
"""An adapter that can be added to a single backend."""

def __init__(self, name: str, adapter_type: AdapterType):
"""An adapter that can be added to a backend.

Note: An adapter can only be added to a single backend.

Args:
name: name of the adapter; when referencing this adapter, use adapter.qualified_name
adapter_type: enum describing what type of adapter it is (ie LORA / ALORA)
"""
self.name = name
self.adapter_type = adapter_type
self.qualified_name = name + "_" + adapter_type.value
"""the name of the adapter to use when loading / looking it up"""

self.backend: Backend | None = None
"""set when the adapter is added to a backend"""

self.path: str | None = None
"""set when the adapter is added to a backend"""


class OpenAIAdapter(Adapter):
"""Adapter for OpenAIBackends."""

@abc.abstractmethod
def get_open_ai_path(
self,
base_model_name: str,
server_type: _ServerType = _ServerType.LOCALHOST,
remote_path: str | None = None,
) -> str:
"""Returns the path needed to load the adapter.

Args:
base_model_name: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct"
server_type: the server type (ie LOCALHOST / OPENAI); usually the backend has information on this
remote_path: optional; used only if the server_type is REMOTE_VLLM; base path at which to find the adapter
"""
...


class LocalHFAdapter(Adapter):
"""Adapter for LocalHFBackends."""

@abc.abstractmethod
def get_local_hf_path(self, base_model_name: str) -> str:
"""Returns the path needed to load the adapter.

Args:
base_model_name: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct"
"""
...


class GraniteCommonAdapter(OpenAIAdapter, LocalHFAdapter):
"""Adapter for intrinsics that utilize the GraniteCommon library."""

def __init__(
self,
name: str,
adapter_type: AdapterType = AdapterType.ALORA,
config_file: str | pathlib.Path | None = None,
config_dict: dict | None = None,
base_model_name: str | None = None,
):
"""An adapter that can be added to either an `OpenAIBackend` or a `LocalHFBackend`. Most rag-lib-intrinsics support lora or alora adapter types.

Args:
name: name of the adapter; when referencing this adapter, use adapter.qualified_name
adapter_type: enum describing what type of adapter it is (ie LORA / ALORA)
config_file: optional; file for defining the intrinsic / transformations
config_dict: optional; dict for defining the intrinsic / transformations
base_model_name: optional; if provided with no config_file/config_dict, will be used to lookup the granite_common config for this adapter
"""
assert adapter_type == AdapterType.ALORA or adapter_type == AdapterType.LORA, (
f"{adapter_type} not supported"
)
super().__init__(name, adapter_type)

self.base_model_name = base_model_name

# If any of the optional params are specified, attempt to set up the
# config for the intrinsic here.
config: dict | None = None
if config_file is not None or config_dict is not None:
config = granite_common.intrinsics.util.make_config_dict(
config_file=config_file, config_dict=config_dict
)
config = cast(
dict, config
) # Can remove if util function gets exported properly.

if config is None and self.base_model_name is not None:
is_alora = True if self.adapter_type == AdapterType.ALORA else False
io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
self.name, self.base_model_name, alora=is_alora
)
config = granite_common.intrinsics.util.make_config_dict(
config_file=io_yaml_file
)
config = cast(
dict, config
) # Can remove if util function gets exported properly.

self.config: dict | None = config

def get_open_ai_path(
self,
base_model_name: str,
server_type: _ServerType = _ServerType.LOCALHOST,
remote_path: str | None = None,
) -> str:
"""Returns the path needed to load the adapter.

Args:
base_model_name: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct"
server_type: the server type (ie LOCALHOST / OPENAI); usually the backend has information on this
remote_path: optional; used only if the server_type is REMOTE_VLLM; base path at which to find the adapter
"""
if server_type == _ServerType.LOCALHOST:
path = self.download_and_get_path(base_model_name)
elif server_type == _ServerType.REMOTE_VLLM:
if remote_path is None:
remote_path = "rag-intrinsics-lib"
path = self.get_path_on_remote(base_model_name, remote_path)
else:
raise ValueError(
f"{self} not supported for OpenAIBackend with server_type: {server_type}"
)

return path

def get_local_hf_path(self, base_model_name: str) -> str:
"""Returns the path needed to load the adapter.

Args:
base_model_name: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct"
"""
return self.download_and_get_path(base_model_name)

def download_and_get_path(self, base_model_name: str) -> str:
"""Downloads the required rag intrinsics files if necessary and returns the path to the them.

Args:
base_model_name: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct"

Returns:
a path to the files
"""
is_alora = self.adapter_type == AdapterType.ALORA
return str(
granite_common.intrinsics.util.obtain_lora(
self.name, base_model_name, alora=is_alora
)
)

def get_path_on_remote(self, base_model_name: str, base_path: str) -> str:
"""Assumes the files have already been downloaded on the remote server."""
return f"./{base_path}/{self.name}/{self.adapter_type.value}/{base_model_name}"


T = TypeVar("T")


def get_adapter_for_intrinsic(
intrinsic_name: str,
intrinsic_adapter_types: list[AdapterType],
available_adapters: dict[str, T],
) -> T | None:
"""Finds an adapter from a dict of available adapters based on the intrinsic name and its allowed adapter types.

Args:
intrinsic_name: the name of the intrinsic, like "answerability"
intrinsic_adapter_types: the adapter types allowed for this intrinsic, like ALORA / LORA
available_adapters: the available adapters to choose from; maps adapter.qualified_name to the Adapter

Returns:
an Adapter if found; else None
"""
adapter = None
for adapter_type in intrinsic_adapter_types:
qualified_name = intrinsic_name + "_" + adapter_type.value
adapter = available_adapters.get(qualified_name, None)
if adapter is not None:
break

return adapter


class AdapterMixin(abc.ABC):
"""Mixin class for backends capable of utilizing adapters."""

def add_adapter(self, *args, **kwargs):
"""Adds the given adapter to the backend. Must not have been added to a different backend."""

def load_adapter(self, adapter_qualified_name: str):
"""Loads the given adapter for the backend. Must have previously been added."""

def unload_adapter(self, adapter_qualified_name: str):
"""Unloads the given adapter from the backend."""
Loading
Loading