1
1
"""
2
2
Backend interface and registry for generative AI model interactions.
3
3
4
- This module provides the abstract base class and interface for implementing
5
- backends that communicate with generative AI models. Backends handle the
6
- lifecycle of generation requests, including startup, validation, request
7
- processing, and shutdown phases.
4
+ Provides the abstract base class for implementing backends that communicate with
5
+ generative AI models. Backends handle the lifecycle of generation requests.
8
6
9
7
Classes:
10
8
Backend: Abstract base class for generative AI backends with registry support.
@@ -42,44 +40,38 @@ class Backend(
42
40
"""
43
41
Abstract base class for generative AI backends with registry and lifecycle.
44
42
45
- This class defines the interface for implementing backends that communicate with
46
- generative AI models. It combines the registry pattern for automatic discovery
47
- with a well-defined lifecycle for process-based distributed execution.
43
+ Provides a standard interface for backends that communicate with generative AI
44
+ models. Combines the registry pattern for automatic discovery with a defined
45
+ lifecycle for process-based distributed execution.
48
46
49
- The backend lifecycle consists of four main phases:
50
- 1. Creation and initial configuration (constructor and factory methods)
51
- 2. Process startup - Initialize resources within a worker process
52
- 3. Validation - Verify backend readiness and configuration
53
- 4. Request resolution - Process generation requests iteratively
54
- 5. Process shutdown - Clean up resources when process terminates
47
+ Backend lifecycle phases:
48
+ 1. Creation and configuration
49
+ 2. Process startup - Initialize resources in worker process
50
+ 3. Validation - Verify backend readiness
51
+ 4. Request resolution - Process generation requests
52
+ 5. Process shutdown - Clean up resources
55
53
56
- All backend implementations must ensure that their state (excluding resources
57
- created during process_startup) is pickleable to support transfer across
58
- process boundaries in distributed execution environments.
54
+ Backend state (excluding process_startup resources) must be pickleable for
55
+ distributed execution across process boundaries.
59
56
60
57
Example:
61
58
::
62
- # Register a custom backend implementation
63
59
@Backend.register("my_backend")
64
60
class MyBackend(Backend):
65
61
def __init__(self, api_key: str):
66
62
super().__init__("my_backend")
67
63
self.api_key = api_key
68
64
69
65
async def process_startup(self):
70
- # Initialize process-specific resources
71
66
self.client = MyAPIClient(self.api_key)
72
67
73
- ...
74
-
75
- # Create backend instance using factory method
76
68
backend = Backend.create("my_backend", api_key="secret")
77
69
"""
78
70
79
71
@classmethod
80
72
def create (cls , type_ : BackendType , ** kwargs ) -> "Backend" :
81
73
"""
82
- Factory method to create a backend instance based on the backend type.
74
+ Create a backend instance based on the backend type.
83
75
84
76
:param type_: The type of backend to create.
85
77
:param kwargs: Additional arguments for backend initialization.
@@ -93,65 +85,72 @@ def create(cls, type_: BackendType, **kwargs) -> "Backend":
93
85
94
86
def __init__ (self , type_ : BackendType ):
95
87
"""
96
- Initialize a backend instance with the specified type .
88
+ Initialize a backend instance.
97
89
98
- :param type_: The backend type identifier for this instance .
90
+ :param type_: The backend type identifier.
99
91
"""
100
92
self .type_ = type_
101
93
102
94
@property
103
95
def processes_limit (self ) -> Optional [int ]:
104
96
"""
105
- :return: The maximum number of worker processes supported by the
106
- backend. None if not limited.
97
+ :return: Maximum number of worker processes supported. None if unlimited.
107
98
"""
108
99
return None
109
100
110
101
@property
111
102
def requests_limit (self ) -> Optional [int ]:
112
103
"""
113
- :return: The maximum number of concurrent requests that can be processed
114
- at once globally by the backend. None if not limited .
104
+ :return: Maximum number of concurrent requests supported globally.
105
+ None if unlimited .
115
106
"""
116
107
return None
117
108
109
+ @abstractmethod
110
+ def info (self ) -> dict [str , Any ]:
111
+ """
112
+ :return: Backend metadata including model information, endpoints, and
113
+ configuration data for reporting and diagnostics.
114
+ """
115
+ ...
116
+
118
117
@abstractmethod
119
118
async def process_startup (self ):
120
119
"""
121
120
Initialize process-specific resources and connections.
122
121
123
- This method is called when a backend instance is transferred to a worker
124
- process and needs to establish connections, initialize clients, or set up
125
- any other resources required for request processing. All resources created
126
- here are process-local and do not need to be pickleable.
127
- If there are any errors during startup, this method should raise an
128
- appropriate exception.
122
+ Called when a backend instance is transferred to a worker process.
123
+ Creates connections, clients, and other resources required for request
124
+ processing. Resources created here are process-local and need not be
125
+ pickleable.
129
126
130
- Must be called before validate() or resolve() can be used.
127
+ Must be called before validate() or resolve().
128
+
129
+ :raises: Exception if startup fails.
131
130
"""
132
131
...
133
132
134
133
@abstractmethod
135
- async def validate (self ):
134
+ async def process_shutdown (self ):
136
135
"""
137
- Validate backend configuration and readiness for request processing .
136
+ Clean up process-specific resources and connections .
138
137
139
- This method verifies that the backend is properly configured and can
140
- successfully communicate with the target model service. It should be
141
- called after process_startup() and before resolve() to ensure the
142
- backend is ready to handle generation requests.
143
- If the backend cannot connect to the service or is not ready,
144
- this method should raise an appropriate exception.
138
+ Called when the worker process is shutting down. Cleans up resources
139
+ created during process_startup(). After this method, validate() and
140
+ resolve() should not be used.
145
141
"""
142
+ ...
146
143
147
144
@abstractmethod
148
- async def process_shutdown (self ):
145
+ async def validate (self ):
149
146
"""
150
- Clean up process-specific resources and connections .
147
+ Validate backend configuration and readiness .
151
148
152
- This method is called when the worker process is shutting down and
153
- should clean up any resources created during process_startup(). After
154
- this method is called, validate() and resolve() should not be used.
149
+ Verifies the backend is properly configured and can communicate with the
150
+ target model service. Should be called after process_startup() and before
151
+ resolve().
152
+
153
+ :raises: Exception if backend is not ready or cannot connect.
155
154
"""
156
155
...
157
156
@@ -167,37 +166,23 @@ async def resolve(
167
166
"""
168
167
Process a generation request and yield progressive responses.
169
168
170
- This method processes a generation request through the backend's model
171
- service, yielding intermediate responses as the generation progresses.
172
- The final yielded item contains the complete response and timing data.
173
-
174
- The request_info parameter is updated with timing metadata and other
175
- tracking information throughout the request processing lifecycle.
169
+ Processes a generation request through the backend's model service,
170
+ yielding intermediate responses as generation progresses. The final
171
+ yielded item contains the complete response and timing data.
176
172
177
- :param request: The generation request containing content and parameters.
178
- :param request_info: Request tracking information to be updated with
179
- timing and progress metadata during processing.
173
+ :param request: The generation request with content and parameters.
174
+ :param request_info: Request tracking information updated with timing
175
+ and progress metadata during processing.
180
176
:param history: Optional conversation history for multi-turn requests.
181
- Each tuple contains a previous request-response pair that provides
182
- context for the current generation.
183
- :yields: Tuples of (response, updated_request_info) as the generation
184
- progresses. The final tuple contains the complete response.
185
- """
186
- ...
187
-
188
- @abstractmethod
189
- async def info (self ) -> dict [str , Any ]:
190
- """
191
- :return: Dictionary containing backend metadata such as model
192
- information, service endpoints, version details, and other
193
- configuration data useful for reporting and diagnostics.
177
+ Each tuple contains a previous request-response pair.
178
+ :yields: Tuples of (response, updated_request_info) as generation
179
+ progresses. Final tuple contains the complete response.
194
180
"""
195
181
...
196
182
197
183
@abstractmethod
198
184
async def default_model (self ) -> str :
199
185
"""
200
- :return: The model name or identifier that this backend is
201
- configured to use by default for generation requests.
186
+ :return: The default model name or identifier for generation requests.
202
187
"""
203
188
...
0 commit comments