|
| 1 | +Dev Box Serverless GPU Compute |
| 2 | + |
| 3 | +Overview |
| 4 | + |
| 5 | +Enterprises are increasingly looking for flexible, scalable, and cost-efficient solutions to run high-performance AI workloads. Traditional GPU provisioning often requires long-term commitments and significant upfront investments, making it challenging for organizations to optimize resources and control costs, especially for sporadic, high-intensity workloads. |
| 6 | + |
| 7 | +The Dev Box Serverless GPU Compute feature addresses this challenge by integrating Microsoft Dev Box with Azure Container Apps (ACA), enabling on-demand access to powerful GPU resources without requiring long-term provisioning. Developers can dynamically allocate GPU power within their Dev Box based on the demands of their AI tasks, such as model training, fine-tuning, and data preprocessing. |
| 8 | + |
| 9 | +Beyond compute flexibility, Dev Box also provides a secure development environment for AI workloads that require access to sensitive corporate data. Many enterprises need to train models on proprietary datasets that are restricted by network-layer security policies. Since Dev Box is already embedded within an organization’s secure network and governance framework, it enables AI engineers to access and process protected data while ensuring compliance with corporate security standards. |
| 10 | + |
| 11 | +This integration delivers a unique combination of flexibility, security, and cost optimization, ensuring that enterprises can scale GPU resources efficiently while maintaining tight control over data access and compliance. By eliminating the complexities of provisioning and securing AI development environments, Dev Box enables developers to focus on innovation rather than infrastructure management. |
| 12 | + |
| 13 | +Architecture |
| 14 | + |
| 15 | +The Dev Box Serverless GPU Compute feature leverages a tight integration with Azure Container Apps (ACA) to provide on-demand, high-performance GPU compute for AI workloads attached to the customer’s private network. This architecture is designed to be seamless for developers, enabling powerful compute resources without the need for manual setup or long-term provisioning. |
| 16 | + |
| 17 | +Integration with Azure Container Apps (ACA) |
| 18 | + |
| 19 | +At the core of the Dev Box serverless GPU compute solution is the integration with Azure Container Apps Serverless GPU. This integration ensures that developers can access GPU resources on-demand, scaling as required by their AI workloads. ACA abstracts the complexity of GPU provisioning, allowing Dev Box to handle resource allocation and usage automatically without requiring intervention from the developer. |
| 20 | + |
| 21 | +Seamless User Experience: With this integration, users will interact with Dev Box as usual, without needing to be aware that Azure Container Apps is behind the scenes nor creating any resources or connections themselves. GPU resources will be allocated dynamically as part of the Dev Box infrastructure, abstracting the ACA technology and setup away from the developer. |
| 22 | + |
| 23 | +MOBO Architecture Model: We will adopt the MOBO architecture model for ACA integration. In this model, ACA instances will be created and managed within the customer’s subscription, providing a more controlled and streamlined management experience for the customers. The dev box service can effectively and securely manage ACA session Box without introducing additional complexity. |
| 24 | + |
| 25 | +GPU Hardware Availability |
| 26 | + |
| 27 | +ACA currently supports two primary GPU options for AI workloads: |
| 28 | + |
| 29 | +NVIDIA T4 GPUs – Readily available with minimal quota concerns |
| 30 | + |
| 31 | +NVIDIA A100 GPUs – More powerful but available in limited capacity |
| 32 | + |
| 33 | +These GPU resources are currently available in four Azure regions: |
| 34 | + |
| 35 | +West US 3 |
| 36 | + |
| 37 | +Sweden North |
| 38 | + |
| 39 | +Australia East |
| 40 | + |
| 41 | +While the initial rollout focuses on these locations, ACA’s GPU support can be expanded into additional regions based on demand. The v0 integration will only support T4 GPUs |
| 42 | + |
| 43 | +Consideration for vNet Injection |
| 44 | + |
| 45 | +We recognize that vNet injection will likely be a common customer ask. vNet injection will allow customers to integrate their network and security protocols with the serverless GPU environment. Although this capability is not a requirement for the POC, it will be prioritized for public previews and general availability (GA). We will ensure that with vNet injection, customers can leverage vNet injection for tighter control over network and security configurations. |
| 46 | + |
| 47 | +Enabling Serverless GPUs at the Project Level |
| 48 | + |
| 49 | +Serverless GPUs will be enabled per project using Dev Center Project Policies. This allows administrators to define and control which projects within an organization can access GPU resources, ensuring that GPU usage is in line with organizational requirements and budget considerations. See admin controls section for details on specific configurations. |
| 50 | + |
| 51 | +Access Control and Serverless GPU Granting |
| 52 | + |
| 53 | +Access to serverless GPU resources in Dev Box will be managed through project-level properties. When the serverless GPU feature is enabled for a project, all Dev Boxes within that project will automatically have access to GPU compute. |
| 54 | + |
| 55 | +This shift simplifies the access model by removing the need for custom roles or pool-based configurations. Instead, GPU access is now governed centrally through a project properties. Future iterations of project Dev Center’s project policy infrastructure. |
| 56 | + |
| 57 | +For more information on how admins can enable this feature, define GPU types, and set per-user limits, see the Admin Controls section. |
| 58 | + |
| 59 | +Developer Experience |
| 60 | + |
| 61 | +The goal of the Developer Experience for Dev Box Serverless GPU Compute is to make accessing GPU resources seamless and native, with no setup required from the developer. The aim is to create a new kind of shell that has built-in access to GPU compute via an ACA session. This shell will be available across platforms like Windows Terminal, Visual Studio, and VS Code in a native, in-box experience. |
| 62 | + |
| 63 | +Shell Extension for Windows Terminal |
| 64 | + |
| 65 | +Windows Terminal serves as a terminal emulator for different kinds of shells. To enable GPU access, we will introduce a new shell, tentatively called "DevBoxGPU Shell". This shell will be connected to a serverless GPU ACA session, allowing developers to run GPU-powered workloads directly from the terminal. |
| 66 | + |
| 67 | +When a new shell instance is launched, an ACA session will start running in the background, providing GPU access. |
| 68 | + |
| 69 | +The ACA instance will remain active as long as the shell is open, and resource usage will be billed accordingly. |
| 70 | + |
| 71 | +Once the shell is closed, the ACA instance will automatically shut down, stopping any further resource usage and billing. |
| 72 | + |
| 73 | +This ensures that developers have access to GPU resources with zero manual configuration, providing a clean and efficient workflow. |
| 74 | + |
| 75 | +A screenshot of a computer program |
| 76 | + |
| 77 | +AI-generated content may be incorrect., Picture |
| 78 | + |
| 79 | +Visual Studio |
| 80 | + |
| 81 | +Since Visual Studio hosts Windows Terminal natively and can expose various shells, it allows us to extend this seamless GPU access directly within the IDE. By creating GPU-powered shells within Visual Studio, developers will be able to launch GPU-intensive tasks directly from their development environment, further streamlining their workflow: |
| 82 | + |
| 83 | +A screen shot of a computer |
| 84 | + |
| 85 | +AI-generated content may be incorrect., Picture |
| 86 | + |
| 87 | +AI Toolkit for VS Code |
| 88 | + |
| 89 | +The AI Toolkit for VS Code provides a rich ecosystem for AI development as a VS Code extension, including fine-tuning, inference, and an integrated model marketplace. Dev Box Serverless GPU Compute will seamlessly integrate with the AI Toolkit’s ACA-based backend, enabling developers to: |
| 90 | + |
| 91 | +Instantly access serverless GPUs for AI workloads without additional setup. |
| 92 | + |
| 93 | +Utilize the AI Toolkit’s model marketplace to select and deploy AI models efficiently. |
| 94 | + |
| 95 | +Leverage built-in fine-tuning and inference capabilities powered by ACA. |
| 96 | + |
| 97 | +Use an integrated playground to test and iterate on AI models in real-time. |
| 98 | + |
| 99 | +This integration ensures that developers can take advantage of serverless GPU compute provided via Dev Box directly within VS Code, making AI development more accessible and frictionless. |
| 100 | + |
| 101 | +Multiple Shell Instances |
| 102 | + |
| 103 | +From an architectural standpoint, there are several options regarding how new instances of the DevBoxGPU Shell can interact with ACA sessions. Below are the key options we are considering: |
| 104 | + |
| 105 | +Option 1: Multiple instances of the DevBoxGPU Shell share a single ACA session. In this setup, the same GPU is allocated across multiple shell instances, allowing them to share GPU compute resources. |
| 106 | + |
| 107 | +Option 2: Each new instance of the DevBoxGPU Shell is assigned to a separate ACA session, with each instance having its own dedicated GPU. This means that a user can access multiple GPUs simultaneously by running separate instances of the shell. For POC purposes, we will pursue this option. |
| 108 | + |
| 109 | +Option 3: The system allocates dedicated GPUs to each instance of the DevBoxGPU Shell until the user’s maximum GPU allocation is reached. After this limit is hit, additional shell instances will begin sharing GPU compute across sessions. |
| 110 | + |
| 111 | +For the POC, we will pursue Option 2, where each shell instance gets its own dedicated ACA session and GPU, ensuring clear isolation of resources. |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | +Admin controls |
| 116 | + |
| 117 | +Project Policies |
| 118 | + |
| 119 | +Serverless GPU access is controlled through project properties. Admins will be able to manage serverless GPU settings via API or a forthcoming Project Configuration blade in the portal. |
| 120 | + |
| 121 | +Key capabilities include: |
| 122 | + |
| 123 | +Enable/Disable GPU Access: Serverless GPU compute can be toggled at the project level through a dedicated property. |
| 124 | + |
| 125 | +Set Max Concurrent GPU Count: Each project can specify the maximum number of GPUs that can be used concurrently across all Dev Boxes in that project. This acts as a soft cap for total GPU usage, helping control overall consumption. |
| 126 | + |
| 127 | +Because only T4 will be available for v0 |
| 128 | + |
| 129 | +Note: While project policies (as known today) do not directly govern GPU access, future enhancements will integrate project policies more tightly with these GPU properties, enabling better governance and centralized enforcement. |
| 130 | + |
| 131 | + |
| 132 | + |
| 133 | +Additional Cost Controls |
| 134 | + |
| 135 | +For Proof of Concept (POC) purposes, subscription quota will be utilized for cost management. This means the overall GPU usage across projects will be managed within a user’s subscription limits. However, as the feature evolves, we may need to consider per-project GPU quotas at the project policy level to provide further granularity and control over costs. |
| 136 | + |
| 137 | +Image Management |
| 138 | + |
| 139 | +Each ACA instance will be tied to a Linux image. While ACA provides a broad set of pre-configured images, we anticipate that Dev Box customers may prefer to use their own custom images to better meet their specific requirements. To support this, we are evaluating options for custom image management. |
| 140 | + |
| 141 | +One current option is to bring your own image by providing an Azure Container Registry (ACR) that contains the desired image. This would allow admins to upload and manage custom images for use within ACA. |
| 142 | + |
| 143 | +For the POC purposes, we will utilize ACA’s pre-canned images (https://learn.microsoft.com/en-us/azure/container-apps/sessions-code-interpreter#preinstalled-packages). |
| 144 | + |
| 145 | +Scenarios |
| 146 | + |
| 147 | +The Dev Box Serverless GPU Compute feature is designed to support a wide range of CLI-driven tasks that benefit from on-demand, high-performance compute. This flexibility allows developers to run a variety of compute-intensive workflows without the need for dedicated GPU infrastructure. Some key scenarios include: |
| 148 | + |
| 149 | +AI Model Training and Inference: On-demand GPU access for tasks like training large models, fine-tuning, and running inference workloads. |
| 150 | + |
| 151 | +Data Processing and Preprocessing: Accelerated data manipulation and transformation for large datasets. |
| 152 | + |
| 153 | +High-Performance Computing (HPC): Support for simulations, scientific computations, and other resource-intensive tasks. |
| 154 | + |
| 155 | +Cloud-Native Development: Scaling GPU resources for cloud-native, containerized workflows in AI and beyond. |
| 156 | + |
| 157 | +CLI-Based Workflows: Developers can leverage GPUs for any other CLI-based task that benefits from intensive compute, whether for AI, simulations, or other specialized domains. |
| 158 | + |
| 159 | +Why Dev Box? |
| 160 | + |
| 161 | +Dev Box brings several key advantages to enterprises looking to leverage serverless GPU compute for AI and other compute-heavy tasks: |
| 162 | + |
| 163 | +No Need for Resource Creation Permissions: In many enterprises, developers lack access to the broader cloud infrastructure or the permissions required to create and manage GPU resources like ACA instances. With Dev Box, developers can access serverless GPU compute without needing to manage or create the underlying resources themselves. |
| 164 | + |
| 165 | +Instant Access to GPU Compute: Dev Box allows developers to get up and running with serverless GPU compute with just a single click. There's no need for manual configuration or setup, ensuring developers can focus on their work rather than worrying about infrastructure. |
| 166 | + |
| 167 | +Centralized Control for Admins: Dev Box integrates seamlessly with Dev Center's project policies, giving administrators granular control over serverless GPU access. Admins can define consumption limits, enable or disable GPU access on a per-project basis, and set permissions for users, all within the familiar Dev Center infrastructure. |
| 168 | + |
| 169 | +Secure Private Network Integration: Dev Box runs within a private, enterprise-managed network. This ensures that sensitive corporate data used for AI workloads—such as proprietary models, internal datasets, or compliance-bound information—remains isolated and secure at the network layer. This added layer of security is crucial for enterprises handling regulated or confidential data. |
| 170 | + |
| 171 | +POC Plan |
| 172 | + |
| 173 | +Stage 1 – ETA 1-2 weeks – Eng: Nick Depinet |
| 174 | + |
| 175 | +Develop a shell (Windows Terminal extension) that communicates with ACA and can be launched from within Dev Box. |
| 176 | + |
| 177 | +AI Toolkit Integration |
| 178 | + |
| 179 | +Checkpoint: Begin collection internal developer feedback on shell functionality and integration. |
| 180 | + |
| 181 | +Stage 2 – ETA 2-3 weeks – Eng: Sneha |
| 182 | + |
| 183 | +Implement Agent Management Service (AMS), handle authentication, session management, and related tasks. |
| 184 | + |
| 185 | +Stage 3 – ETA 3-4 weeks |
| 186 | + |
| 187 | +Introduce admin controls |
| 188 | + |
| 189 | +HOBO provisioning |
| 190 | + |
| 191 | +Begin planning for vNet injection support as a future enhancement. |
| 192 | + |
| 193 | +Stage 4 – ETA 4-5 weeks |
| 194 | + |
| 195 | +Finalize portal experience integration, enabling a seamless user interface for Dev Box users to manage GPU compute access. |
| 196 | + |
| 197 | +Open questions |
| 198 | + |
| 199 | +What is data persistency story? |
| 200 | + |
| 201 | +What is the user experience around handling GPU limits per user? |
| 202 | + |
| 203 | +How do we think about GPU pooling? |
| 204 | + |
| 205 | +Where does the session pool live in dev center infra |
| 206 | + |
| 207 | +Rude FAQ |
| 208 | + |
| 209 | +Experience related |
| 210 | + |
| 211 | +Why is the GPU accessible only as an external process? Why can't I use the GPU to accelerate my DevBox graphics? |
| 212 | + |
| 213 | +Why do I have to request for GPU quota separately? Why can’t you auto-grant GPU quota to match the size of my Dev Box pool? |
| 214 | + |
| 215 | +As an IT Admin for an Enterprise customer, why should I procure Serverless GPU through DevBox instead of directly procuring ACA Serverless GPU? |
| 216 | + |
| 217 | +Current limitations / Roadmap related |
| 218 | + |
| 219 | +Why can I only access GPUs via Shell? Why isn't there a GUI? |
| 220 | + |
| 221 | +Why aren't you giving me the latest generation GPUs? I really need H100s |
| 222 | + |
| 223 | +I need multiple GPUs attached to a single DevBox, why are you making me create multiple shells which get 1 GPU each instead of giving me N GPUs in a single shell? |
| 224 | + |
| 225 | +I want to run Windows only software such as GameMaker on Serverless GPUs. Why am I limited to Linux only? |
| 226 | + |
| 227 | + |
0 commit comments