-
Notifications
You must be signed in to change notification settings - Fork 916
feat: GPU Memory Service #5286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: GPU Memory Service #5286
Changes from 14 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
1893933
feat: add GPU Memory Service core library and component wrapper
galletas1712 a67af51
Merge branch 'main' into schwinns/gms-core-library
galletas1712 4009838
Merge branch 'main' into schwinns/gms-core-library
galletas1712 e56f858
add setup to other dockerfiles
galletas1712 88c180b
Merge branch 'main' into schwinns/gms-core-library
galletas1712 de47b9f
Merge branch 'main' into schwinns/gms-core-library
galletas1712 e4d830a
Updated copyright
galletas1712 3a11549
Add install gating
galletas1712 a0199ec
Merge branch 'main' into schwinns/gms-core-library
galletas1712 0f07aee
Merge branch 'main' into schwinns/gms-core-library
galletas1712 188bdda
Merge branch 'main' into schwinns/gms-core-library
galletas1712 8d0bdd4
Address CodeRabbit review feedback for GPU Memory Service
galletas1712 f60cc61
Merge branch 'main' into schwinns/gms-core-library
galletas1712 8bd0a6e
Fix linting issues in GPU Memory Service
galletas1712 a5c9a09
Update components/src/dynamo/gpu_memory_service/__init__.py
galletas1712 79fd045
Update components/src/dynamo/gpu_memory_service/__init__.py
galletas1712 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """GPU Memory Service component for Dynamo. | ||
|
|
||
| This module provides the Dynamo component wrapper around the gpu_memory_service package. | ||
| The core functionality is in the gpu_memory package; this module provides: | ||
| - CLI entry point (python -m dynamo.gpu_memory_service) | ||
| - Re-exports for backwards compatibility | ||
galletas1712 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
|
|
||
| # Re-export core functionality from gpu_memory_service package | ||
| from gpu_memory_service import ( | ||
| GMSClientMemoryManager, | ||
| StaleMemoryLayoutError, | ||
| get_gms_client_memory_manager, | ||
| get_or_create_gms_client_memory_manager, | ||
| ) | ||
|
|
||
| # Re-export extensions (built separately) | ||
| from gpu_memory_service.client.torch.extensions import _allocator_ext | ||
|
|
||
galletas1712 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # Re-export module utilities | ||
| from gpu_memory_service.client.torch.module import ( | ||
| materialize_module_from_gms, | ||
| register_module_tensors, | ||
| ) | ||
|
|
||
| __all__ = [ | ||
| # Core | ||
| "GMSClientMemoryManager", | ||
| "StaleMemoryLayoutError", | ||
| # GMS client memory manager | ||
| "get_or_create_gms_client_memory_manager", | ||
| "get_gms_client_memory_manager", | ||
| # Tensor utilities | ||
| "register_module_tensors", | ||
| "materialize_module_from_gms", | ||
| # Extensions | ||
| "_allocator_ext", | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from dynamo.gpu_memory_service.server import main | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """Argument parsing for GPU Memory Service server component.""" | ||
|
|
||
| import argparse | ||
| import logging | ||
| from dataclasses import dataclass | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| @dataclass | ||
| class Config: | ||
| """Configuration for GPU Memory Service server.""" | ||
|
|
||
| # GPU Memory Service specific | ||
| device: int | ||
| socket_path: str | ||
| verbose: bool | ||
|
|
||
|
|
||
| def parse_args() -> Config: | ||
| """Parse command line arguments for GPU Memory Service server.""" | ||
| parser = argparse.ArgumentParser( | ||
| description="GPU Memory Service allocation server for Dynamo." | ||
| ) | ||
|
|
||
| # GPU Memory Service specific arguments | ||
| parser.add_argument( | ||
| "--device", | ||
| type=int, | ||
| required=True, | ||
| help="CUDA device ID to manage memory for.", | ||
| ) | ||
| parser.add_argument( | ||
| "--socket-path", | ||
| type=str, | ||
| default=None, | ||
| help="Path for Unix domain socket. Default: /tmp/gpu_memory_service_{device}.sock. " | ||
| "Supports {device} placeholder for multi-GPU setups.", | ||
| ) | ||
| parser.add_argument( | ||
| "--verbose", | ||
| "-v", | ||
| action="store_true", | ||
| help="Enable verbose logging.", | ||
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| # Generate default socket path if not provided | ||
| socket_path = args.socket_path | ||
| if socket_path is None: | ||
| socket_path = f"/tmp/gpu_memory_service_{args.device}.sock" | ||
| else: | ||
| # Expand {device} placeholder | ||
| socket_path = socket_path.format(device=args.device) | ||
galletas1712 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| config = Config( | ||
| device=args.device, | ||
| socket_path=socket_path, | ||
| verbose=args.verbose, | ||
| ) | ||
|
|
||
| return config | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """GPU Memory Service allocation server component for Dynamo. | ||
|
|
||
| This component wraps the GMSRPCServer from gpu_memory_service to manage | ||
| GPU memory allocations with connection-based RW/RO locking. | ||
|
|
||
| Workers connect via the socket path, which should be passed to vLLM/SGLang via: | ||
| --load-format gpu_memory_service | ||
| --model-loader-extra-config '{"gpu_memory_service_socket_path": "/tmp/gpu_memory_service_{device}.sock"}' | ||
|
|
||
| Usage: | ||
| python -m dynamo.gpu_memory_service --device 0 | ||
| python -m dynamo.gpu_memory_service --device 0 --socket-path /tmp/gpu_memory_service_{device}.sock | ||
| """ | ||
|
|
||
| import asyncio | ||
| import logging | ||
| import signal | ||
|
|
||
| import uvloop | ||
| from gpu_memory_service.server import GMSRPCServer | ||
|
|
||
| from .args import parse_args | ||
|
|
||
| logging.basicConfig( | ||
| level=logging.INFO, | ||
| format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", | ||
| ) | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| async def worker() -> None: | ||
| """Main async worker function.""" | ||
| config = parse_args() | ||
|
|
||
| # Configure logging level | ||
| if config.verbose: | ||
| logging.getLogger().setLevel(logging.DEBUG) | ||
| logging.getLogger("dynamo.gpu_memory_service").setLevel(logging.DEBUG) | ||
|
|
||
| logger.info(f"Starting GPU Memory Service Server for device {config.device}") | ||
| logger.info(f"Socket path: {config.socket_path}") | ||
|
|
||
| server = GMSRPCServer(config.socket_path, device=config.device) | ||
|
|
||
| # Set up shutdown handling | ||
| shutdown_event = asyncio.Event() | ||
|
|
||
| def signal_handler(): | ||
| logger.info("Received shutdown signal") | ||
| shutdown_event.set() | ||
|
|
||
| loop = asyncio.get_running_loop() | ||
| for sig in (signal.SIGTERM, signal.SIGINT): | ||
| loop.add_signal_handler(sig, signal_handler) | ||
|
|
||
| await server.start() | ||
|
|
||
| logger.info("GPU Memory Service Server ready, waiting for connections...") | ||
| logger.info( | ||
| f"To connect vLLM workers, use: --load-format gpu_memory_service " | ||
| f'--model-loader-extra-config \'{{"gpu_memory_service_socket_path": "{config.socket_path}"}}\'' | ||
| ) | ||
|
|
||
| # Wait for shutdown signal | ||
| try: | ||
| await shutdown_event.wait() | ||
| finally: | ||
| logger.info("Shutting down GPU Memory Service Server...") | ||
| await server.stop() | ||
| logger.info("GPU Memory Service Server shutdown complete") | ||
|
|
||
|
|
||
| def main() -> None: | ||
| """Entry point for GPU Memory Service server.""" | ||
| uvloop.install() | ||
| asyncio.run(worker()) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
galletas1712 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.