Conversation
|
pulled this down to verify with my examples a few notes:
04_scaling_performance/01_autoscaling/gpu_worker.py configures scaling strategies: scale_to_zero_config = LiveServerless( This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue, Endpoint() doesn't have these params, so there's no way to express this: What we'd want:@endpoint(name="worker", gpu=GpuGroup.ANY, workers=(0, 3), Could we add scaler_type / scaler_value (or a combined scaler= param)?
template = PodTemplate( gpu_config = ServerlessEndpoint( Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size), Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?
Two examples use @Remote on a class for stateful workers. Here's the pattern from @Remote(resource_config=gpu_config, dependencies=["diffusers", "torch", "transformers"]) The class is instantiated once when the worker boots. The model stays in GPU memory via self.pipe and every request With function-based @endpoint, there's no self to hold state: @endpoint(name="worker", gpu=GpuGroup.ANY, dependencies=["diffusers", "torch"]) Does @endpoint(...) support decorating classes the same way @Remote does? If not, we'd need a workaround (module-level
The PR's skeleton templates use GpuType: @endpoint(name="gpu_worker", gpu=GpuType.ANY, dependencies=["torch"]) But existing examples all use GpuGroup: @endpoint(name="worker", gpu=GpuGroup.ADA_24) Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
|
will work on adding those parameters
👍
Endpoint does support classes
we should prefer GpuType in simpler examples, since it is easier to understand, but expand to GpuGroup for situations when more scale is important |
QA ReportStatus: WARN CI StatusAll 6 Quality Gates pass (Python 3.10–3.14 + Build Package). No CI regressions detected.
PR Scope
Test File Summary
PR Diff Analysis
Observations & Issues1. Dual-purpose methods create subtle API surface ep = Endpoint(name="my-api")
ep.post("/compute") # returns a decorator
ep_client = Endpoint(id="x")
ep_client.post("/compute") # returns a coroutineThe distinction is tested but the boundary between "no data arg = decorator" vs "data=None = client call" is not explicitly tested. A user calling 2. 3. Endpoint with Test Quality AssessmentStrengths:
Missing Coverage:
Suggested Improvements:
Review Comments IntegrationThe PR already addresses the 4 review items from @runpod-Henrik:
RecommendationMERGE WITH NOTES The PR is solid — 161 tests, CI green on all Python versions, comprehensive coverage of the new Endpoint API. The dual-purpose method design and
Generated by flash-qa agent |
Unified Endpoint API
Replaces 8 resource config classes (
LiveServerless,CpuLiveServerless,LiveLoadBalancer,CpuLiveLoadBalancer,ServerlessEndpoint,CpuServerlessEndpoint,LoadBalancerSlsResource,CpuLoadBalancerSlsResource) and the@remotedecorator with a single Endpoint class.Fixes AE-2259
Queue-based
Load-balanced
Client mode
What changed