Skip to content

Conversation

@rlundeen2
Copy link
Contributor

Refactored registry logic into a shared pyrit.registry module and added a new ScorerRegistry for managing pre-configured scorer instances.

Changes

  • Abstracted common registry patterns into registry module with base classes for class registries (Type[T]) and instance registries (pre-configured objects)
  • Migrated ScenarioRegistry and InitializerRegistry to inherit from BaseClassRegistry
  • Added ScorerRegistry extending BaseInstanceRegistry for managing scorer instances
  • Added documentation in registry with overview and usage examples

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsurprisingly, I like practically everything about this 😆

Comment on lines +18 to +19
| **Class Registry** | Classes (Type[T]) | Components instantiated with user-provided parameters |
| **Instance Registry** | Pre-configured instances | Components requiring complex setup before use |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this distinction!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought based on the current implementation. Do we want to force some kind of parity between the two registries if they share objects? Suppose we have an instance registry of pre-configured default targets and a class registry of a subset of PromptTarget classes. Do we want these to be related in any way, e.g. instance registry is valid if class registry is valid, but not the other way around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I think I'm a fan of just keeping target registry an instance registry. I can't think of anything offhand I'd want to be both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained that badly, sorry. What I meant is that in a sense an instance registry is a class registry as all instances have a parent class (even if hypothetically it would just be Python's object). We implement them differently, but I could imagine eventually having a situation where we have the same types (like PromptTarget) appear in both an instance registry (use these specific targets paired with this initializer) and a class registry (here are all the kinds of targets I want to make available for some purpose).

I don't think it's in scope for this PR, but I can see this becoming a point of confusion for users down the line. Though I'm not sure what to do about it

Copy link
Contributor

@ValbuenaVC ValbuenaVC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I think we may want to add before merging is encoding/decoding logic for registries. The protocol choice is fantastic, but it requires every registry to store classes or instances differently. Aside from that looks great!

Comment on lines +18 to +19
| **Class Registry** | Classes (Type[T]) | Components instantiated with user-provided parameters |
| **Instance Registry** | Pre-configured instances | Components requiring complex setup before use |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought based on the current implementation. Do we want to force some kind of parity between the two registries if they share objects? Suppose we have an instance registry of pre-configured default targets and a class registry of a subset of PromptTarget classes. Do we want these to be related in any way, e.g. instance registry is valid if class registry is valid, but not the other way around?

_instances: Dict[type, "BaseInstanceRegistry[Any, Any]"] = {}

@classmethod
def get_registry_singleton(cls) -> "BaseInstanceRegistry[T, MetadataT]":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: is there a reason we do this instead of using the Singleton metaclass?

self.register(scorer, name=name)
logger.debug(f"Registered scorer instance: {name} ({scorer.__class__.__name__})")

def get_instance_by_name(self, name: str) -> Optional["Scorer"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not in the PR, but a get_instances_by_scorer_type may be useful in the future for the GUI I'd imagine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing instances has the risk of race conditions since state is shared across parallel scoring processes. Don't think this is a big deal since we don't modify state within our target and scoring methods, but I do see one example of this in prompt_shield_scorer.py in score_piece_async where self._conversation_id is modified. Might be worth to take a look to make sure there aren't more cases like this in scorers and targets and make sure any state modifying methods are private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants