-
Notifications
You must be signed in to change notification settings - Fork 639
FEAT Breaking: Registry protocol + ScorerRegistry #1308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FEAT Breaking: Registry protocol + ScorerRegistry #1308
Conversation
romanlutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsurprisingly, I like practically everything about this 😆
| | **Class Registry** | Classes (Type[T]) | Components instantiated with user-provided parameters | | ||
| | **Instance Registry** | Pre-configured instances | Components requiring complex setup before use | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this distinction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought based on the current implementation. Do we want to force some kind of parity between the two registries if they share objects? Suppose we have an instance registry of pre-configured default targets and a class registry of a subset of PromptTarget classes. Do we want these to be related in any way, e.g. instance registry is valid if class registry is valid, but not the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I think I'm a fan of just keeping target registry an instance registry. I can't think of anything offhand I'd want to be both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained that badly, sorry. What I meant is that in a sense an instance registry is a class registry as all instances have a parent class (even if hypothetically it would just be Python's object). We implement them differently, but I could imagine eventually having a situation where we have the same types (like PromptTarget) appear in both an instance registry (use these specific targets paired with this initializer) and a class registry (here are all the kinds of targets I want to make available for some purpose).
I don't think it's in scope for this PR, but I can see this becoming a point of confusion for users down the line. Though I'm not sure what to do about it
ValbuenaVC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I think we may want to add before merging is encoding/decoding logic for registries. The protocol choice is fantastic, but it requires every registry to store classes or instances differently. Aside from that looks great!
| | **Class Registry** | Classes (Type[T]) | Components instantiated with user-provided parameters | | ||
| | **Instance Registry** | Pre-configured instances | Components requiring complex setup before use | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought based on the current implementation. Do we want to force some kind of parity between the two registries if they share objects? Suppose we have an instance registry of pre-configured default targets and a class registry of a subset of PromptTarget classes. Do we want these to be related in any way, e.g. instance registry is valid if class registry is valid, but not the other way around?
| _instances: Dict[type, "BaseInstanceRegistry[Any, Any]"] = {} | ||
|
|
||
| @classmethod | ||
| def get_registry_singleton(cls) -> "BaseInstanceRegistry[T, MetadataT]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: is there a reason we do this instead of using the Singleton metaclass?
| self.register(scorer, name=name) | ||
| logger.debug(f"Registered scorer instance: {name} ({scorer.__class__.__name__})") | ||
|
|
||
| def get_instance_by_name(self, name: str) -> Optional["Scorer"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in the PR, but a get_instances_by_scorer_type may be useful in the future for the GUI I'd imagine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing instances has the risk of race conditions since state is shared across parallel scoring processes. Don't think this is a big deal since we don't modify state within our target and scoring methods, but I do see one example of this in prompt_shield_scorer.py in score_piece_async where self._conversation_id is modified. Might be worth to take a look to make sure there aren't more cases like this in scorers and targets and make sure any state modifying methods are private.
Refactored registry logic into a shared pyrit.registry module and added a new ScorerRegistry for managing pre-configured scorer instances.
Changes