|
| 1 | +# Overview |
| 2 | + |
| 3 | +Enterprises are increasingly looking for flexible, scalable, and |
| 4 | +cost-efficient solutions to run high-performance AI workloads. |
| 5 | +Traditional GPU provisioning often requires long-term commitments and |
| 6 | +significant upfront investments, making it challenging for organizations |
| 7 | +to optimize resources and control costs, especially for sporadic, |
| 8 | +high-intensity workloads. |
| 9 | + |
| 10 | +The Dev Box Serverless GPU Compute feature addresses this challenge by |
| 11 | +integrating Microsoft Dev Box with Azure Container Apps (ACA), enabling |
| 12 | +on-demand access to powerful GPU resources without requiring long-term |
| 13 | +provisioning. Developers can dynamically allocate GPU power within their |
| 14 | +Dev Box based on the demands of their AI tasks, such as model training, |
| 15 | +fine-tuning, and data preprocessing. |
| 16 | + |
| 17 | +Beyond compute flexibility, Dev Box also provides a secure development |
| 18 | +environment for AI workloads that require access to sensitive corporate |
| 19 | +data. Many enterprises need to train models on proprietary datasets that |
| 20 | +are restricted by network-layer security policies. Since Dev Box is |
| 21 | +already embedded within an organization’s secure network and governance |
| 22 | +framework, it enables AI engineers to access and process protected data |
| 23 | +while ensuring compliance with corporate security standards. |
| 24 | + |
| 25 | +This integration delivers a unique combination of flexibility, security, |
| 26 | +and cost optimization, ensuring that enterprises can scale GPU resources |
| 27 | +efficiently while maintaining tight control over data access and |
| 28 | +compliance. By eliminating the complexities of provisioning and securing |
| 29 | +AI development environments, Dev Box enables developers to focus on |
| 30 | +innovation rather than infrastructure management. |
| 31 | + |
| 32 | +# Architecture |
| 33 | + |
| 34 | +The Dev Box Serverless GPU Compute feature leverages a tight integration |
| 35 | +with Azure Container Apps (ACA) to provide on-demand, high-performance |
| 36 | +GPU compute for AI workloads attached to the customer’s private network. |
| 37 | +This architecture is designed to be seamless for developers, enabling |
| 38 | +powerful compute resources without the need for manual setup or |
| 39 | +long-term provisioning. |
| 40 | + |
| 41 | +## Integration with Azure Container Apps (ACA) |
| 42 | + |
| 43 | +At the core of the Dev Box serverless GPU compute solution is the |
| 44 | +integration with Azure Container Apps Serverless GPU. This integration |
| 45 | +ensures that developers can access GPU resources on-demand, scaling as |
| 46 | +required by their AI workloads. ACA abstracts the complexity of GPU |
| 47 | +provisioning, allowing Dev Box to handle resource allocation and usage |
| 48 | +automatically without requiring intervention from the developer. |
| 49 | + |
| 50 | +- **Seamless User Experience**: With this integration, users will |
| 51 | + interact with Dev Box as usual, without needing to be aware that Azure |
| 52 | + Container Apps is behind the scenes nor creating any resources or |
| 53 | + connections themselves. GPU resources will be allocated dynamically as |
| 54 | + part of the Dev Box infrastructure, abstracting the ACA technology and |
| 55 | + setup away from the developer. |
| 56 | + |
| 57 | +- **MOBO Architecture Model**: We will adopt the MOBO architecture model |
| 58 | + for ACA integration. In this model, ACA instances will be created and |
| 59 | + managed within the customer’s subscription, providing a more |
| 60 | + controlled and streamlined management experience for the customers. |
| 61 | + The dev box service can effectively and securely manage ACA session |
| 62 | + Box without introducing additional complexity. |
| 63 | + |
| 64 | +## GPU Hardware Availability |
| 65 | + |
| 66 | +ACA currently supports two primary GPU options for AI workloads: |
| 67 | + |
| 68 | +- NVIDIA T4 GPUs – Readily available with minimal quota concerns |
| 69 | + |
| 70 | +- NVIDIA A100 GPUs – More powerful but available in limited capacity |
| 71 | + |
| 72 | +These GPU resources are currently available in four Azure regions: |
| 73 | + |
| 74 | +- West US 3 |
| 75 | + |
| 76 | +- Sweden North |
| 77 | + |
| 78 | +- Australia East |
| 79 | + |
| 80 | +While the initial rollout focuses on these locations, ACA’s GPU support |
| 81 | +can be expanded into additional regions based on demand. The v0 |
| 82 | +integration will only support T4 GPUs |
| 83 | + |
| 84 | +## Consideration for vNet Injection |
| 85 | + |
| 86 | +We recognize that vNet injection will likely be a common customer ask. |
| 87 | +vNet injection will allow customers to integrate their network and |
| 88 | +security protocols with the serverless GPU environment. Although this |
| 89 | +capability is not a requirement for the POC, it will be prioritized for |
| 90 | +public previews and general availability (GA). We will ensure that with |
| 91 | +vNet injection, customers can leverage vNet injection for tighter |
| 92 | +control over network and security configurations. |
| 93 | + |
| 94 | +## Enabling Serverless GPUs at the Project Level |
| 95 | + |
| 96 | +Serverless GPUs will be enabled per project using Dev Center Project |
| 97 | +Policies. This allows administrators to define and control which |
| 98 | +projects within an organization can access GPU resources, ensuring that |
| 99 | +GPU usage is in line with organizational requirements and budget |
| 100 | +considerations. See [admin controls](#_Admin_Controls) section for |
| 101 | +details on specific configurations. |
| 102 | + |
| 103 | +## Access Control and Serverless GPU Granting |
| 104 | + |
| 105 | +Access to serverless GPU resources in Dev Box will be managed through |
| 106 | +project-level properties. When the serverless GPU feature is enabled for |
| 107 | +a project, all Dev Boxes within that project will automatically have |
| 108 | +access to GPU compute. |
| 109 | + |
| 110 | +This shift simplifies the access model by removing the need for custom |
| 111 | +roles or pool-based configurations. Instead, GPU access is now governed |
| 112 | +centrally through a project properties. Future iterations of project Dev |
| 113 | +Center’s **project policy infrastructure**. |
| 114 | + |
| 115 | +For more information on how admins can enable this feature, define GPU |
| 116 | +types, and set per-user limits, see the **Admin Controls** section. |
| 117 | + |
| 118 | +# Developer Experience |
| 119 | + |
| 120 | +The goal of the Developer Experience for Dev Box Serverless GPU Compute |
| 121 | +is to make accessing GPU resources seamless and native, with no setup |
| 122 | +required from the developer. The aim is to create a new kind of shell |
| 123 | +that has built-in access to GPU compute via an ACA session. This shell |
| 124 | +will be available across platforms like Windows Terminal, Visual Studio, |
| 125 | +and VS Code in a native, in-box experience. |
| 126 | + |
| 127 | +## Shell Extension for Windows Terminal |
| 128 | + |
| 129 | +Windows Terminal serves as a terminal emulator for different kinds of |
| 130 | +shells. To enable GPU access, we will introduce a new shell, tentatively |
| 131 | +called "DevBoxGPU Shell". This shell will be connected to a serverless |
| 132 | +GPU ACA session, allowing developers to run GPU-powered workloads |
| 133 | +directly from the terminal. |
| 134 | + |
| 135 | +- When a new shell instance is launched, an ACA session will start |
| 136 | + running in the background, providing GPU access. |
| 137 | + |
| 138 | +- The ACA instance will remain active as long as the shell is open, and |
| 139 | + resource usage will be billed accordingly. |
| 140 | + |
| 141 | +- Once the shell is closed, the ACA instance will automatically shut |
| 142 | + down, stopping any further resource usage and billing. |
| 143 | + |
| 144 | +This ensures that developers have access to GPU resources with zero |
| 145 | +manual configuration, providing a clean and efficient workflow. |
| 146 | + |
| 147 | +<img src="media/concept-serverless-gpu-v2/image1.png" style="width:6.5in;height:4.19167in" |
| 148 | +alt="A screenshot of a computer program AI-generated content may be incorrect." /> |
| 149 | + |
| 150 | +## Visual Studio |
| 151 | + |
| 152 | +Since Visual Studio hosts Windows Terminal natively and can expose |
| 153 | +various shells, it allows us to extend this seamless GPU access directly |
| 154 | +within the IDE. By creating GPU-powered shells within Visual Studio, |
| 155 | +developers will be able to launch GPU-intensive tasks directly from |
| 156 | +their development environment, further streamlining their workflow: |
| 157 | + |
| 158 | +<img src="media/concept-serverless-gpu-v2/image2.png" style="width:6.5in;height:2.05417in" |
| 159 | +alt="A screen shot of a computer AI-generated content may be incorrect." /> |
| 160 | + |
| 161 | +## AI Toolkit for VS Code |
| 162 | + |
| 163 | +The AI Toolkit for VS Code provides a rich ecosystem for AI development |
| 164 | +as a VS Code extension, including fine-tuning, inference, and an |
| 165 | +integrated model marketplace. Dev Box Serverless GPU Compute will |
| 166 | +seamlessly integrate with the AI Toolkit’s ACA-based backend, enabling |
| 167 | +developers to: |
| 168 | + |
| 169 | +- Instantly access serverless GPUs for AI workloads without additional |
| 170 | + setup. |
| 171 | + |
| 172 | +- Utilize the AI Toolkit’s model marketplace to select and deploy AI |
| 173 | + models efficiently. |
| 174 | + |
| 175 | +- Leverage built-in fine-tuning and inference capabilities powered by |
| 176 | + ACA. |
| 177 | + |
| 178 | +- Use an integrated playground to test and iterate on AI models in |
| 179 | + real-time. |
| 180 | + |
| 181 | +This integration ensures that developers can take advantage of |
| 182 | +serverless GPU compute provided via Dev Box directly within VS Code, |
| 183 | +making AI development more accessible and frictionless. |
| 184 | + |
| 185 | +## Multiple Shell Instances |
| 186 | + |
| 187 | +From an architectural standpoint, there are several options regarding |
| 188 | +how new instances of the DevBoxGPU Shell can interact with ACA sessions. |
| 189 | +Below are the key options we are considering: |
| 190 | + |
| 191 | +- **Option 1**: Multiple instances of the DevBoxGPU Shell share a single |
| 192 | + ACA session. In this setup, the same GPU is allocated across multiple |
| 193 | + shell instances, allowing them to share GPU compute resources. |
| 194 | + |
| 195 | +- **Option 2**: Each new instance of the DevBoxGPU Shell is assigned to |
| 196 | + a separate ACA session, with each instance having its own dedicated |
| 197 | + GPU. This means that a user can access multiple GPUs simultaneously by |
| 198 | + running separate instances of the shell. For POC purposes, we will |
| 199 | + pursue this option. |
| 200 | + |
| 201 | +- **Option 3**: The system allocates dedicated GPUs to each instance of |
| 202 | + the DevBoxGPU Shell until the user’s maximum GPU allocation is |
| 203 | + reached. After this limit is hit, additional shell instances will |
| 204 | + begin sharing GPU compute across sessions. |
| 205 | + |
| 206 | +For the POC, we will pursue **Option 2**, where each shell instance gets |
| 207 | +its own dedicated ACA session and GPU, ensuring clear isolation of |
| 208 | +resources. |
| 209 | + |
| 210 | +# Admin controls |
| 211 | + |
| 212 | +## Project Policies |
| 213 | + |
| 214 | +Serverless GPU access is controlled through **project properties**. |
| 215 | +Admins will be able to manage serverless GPU settings via API or a |
| 216 | +forthcoming **Project Configuration** blade in the portal. |
| 217 | + |
| 218 | +Key capabilities include: |
| 219 | + |
| 220 | +- **Enable/Disable GPU Access**: Serverless GPU compute can be toggled |
| 221 | + at the project level through a dedicated property. |
| 222 | + |
| 223 | +- **Set Max Concurrent GPU Count**: Each project can specify the maximum |
| 224 | + number of GPUs that can be used concurrently across all Dev Boxes in |
| 225 | + that project. This acts as a soft cap for total GPU usage, helping |
| 226 | + control overall consumption. |
| 227 | + |
| 228 | +Because only T4 will be available for v0 |
| 229 | + |
| 230 | +*Note: While project policies (as known today) do not directly govern |
| 231 | +GPU access, future enhancements will integrate project policies more |
| 232 | +tightly with these GPU properties, enabling better governance and |
| 233 | +centralized enforcement.* |
| 234 | + |
| 235 | +## Additional Cost Controls |
| 236 | + |
| 237 | +For Proof of Concept (POC) purposes, subscription quota will be utilized |
| 238 | +for cost management. This means the overall GPU usage across projects |
| 239 | +will be managed within a user’s subscription limits. However, as the |
| 240 | +feature evolves, we may need to consider per-project GPU quotas at the |
| 241 | +project policy level to provide further granularity and control over |
| 242 | +costs. |
| 243 | + |
| 244 | +## Image Management |
| 245 | + |
| 246 | +Each ACA instance will be tied to a Linux image. While ACA provides a |
| 247 | +broad set of pre-configured images, we anticipate that Dev Box customers |
| 248 | +may prefer to use their own custom images to better meet their specific |
| 249 | +requirements. To support this, we are evaluating options for custom |
| 250 | +image management. |
| 251 | + |
| 252 | +One current option is to bring your own image by providing an Azure |
| 253 | +Container Registry (ACR) that contains the desired image. This would |
| 254 | +allow admins to upload and manage custom images for use within ACA. |
| 255 | + |
| 256 | +For the POC purposes, we will utilize ACA’s pre-canned images |
| 257 | +(<https://learn.microsoft.com/en-us/azure/container-apps/sessions-code-interpreter#preinstalled-packages>). |
| 258 | + |
| 259 | +# Scenarios |
| 260 | + |
| 261 | +The Dev Box Serverless GPU Compute feature is designed to support a wide |
| 262 | +range of CLI-driven tasks that benefit from on-demand, high-performance |
| 263 | +compute. This flexibility allows developers to run a variety of |
| 264 | +compute-intensive workflows without the need for dedicated GPU |
| 265 | +infrastructure. Some key scenarios include: |
| 266 | + |
| 267 | +- **AI Model Training and Inference**: On-demand GPU access for tasks |
| 268 | + like training large models, fine-tuning, and running inference |
| 269 | + workloads. |
| 270 | + |
| 271 | +- **Data Processing and Preprocessing**: Accelerated data manipulation |
| 272 | + and transformation for large datasets. |
| 273 | + |
| 274 | +- **High-Performance Computing (HPC)**: Support for simulations, |
| 275 | + scientific computations, and other resource-intensive tasks. |
| 276 | + |
| 277 | +- **Cloud-Native Development**: Scaling GPU resources for cloud-native, |
| 278 | + containerized workflows in AI and beyond. |
| 279 | + |
| 280 | +- **CLI-Based Workflows**: Developers can leverage GPUs for any other |
| 281 | + CLI-based task that benefits from intensive compute, whether for AI, |
| 282 | + simulations, or other specialized domains. |
| 283 | + |
| 284 | +# Why Dev Box? |
| 285 | + |
| 286 | +Dev Box brings several key advantages to enterprises looking to leverage |
| 287 | +serverless GPU compute for AI and other compute-heavy tasks: |
| 288 | + |
| 289 | +- **No Need for Resource Creation Permissions**: In many enterprises, |
| 290 | + developers lack access to the broader cloud infrastructure or the |
| 291 | + permissions required to create and manage GPU resources like ACA |
| 292 | + instances. With Dev Box, developers can access serverless GPU compute |
| 293 | + without needing to manage or create the underlying resources |
| 294 | + themselves. |
| 295 | + |
| 296 | +- **Instant Access to GPU Compute**: Dev Box allows developers to get up |
| 297 | + and running with serverless GPU compute with just a single click. |
| 298 | + There's no need for manual configuration or setup, ensuring developers |
| 299 | + can focus on their work rather than worrying about infrastructure. |
| 300 | + |
| 301 | +- **Centralized Control for Admins**: Dev Box integrates seamlessly with |
| 302 | + Dev Center's project policies, giving administrators granular control |
| 303 | + over serverless GPU access. Admins can define consumption limits, |
| 304 | + enable or disable GPU access on a per-project basis, and set |
| 305 | + permissions for users, all within the familiar Dev Center |
| 306 | + infrastructure. |
| 307 | + |
| 308 | +- **Secure Private Network Integration**: Dev Box runs within a private, |
| 309 | + enterprise-managed network. This ensures that sensitive corporate data |
| 310 | + used for AI workloads—such as proprietary models, internal datasets, |
| 311 | + or compliance-bound information—remains isolated and secure at the |
| 312 | + network layer. This added layer of security is crucial for enterprises |
| 313 | + handling regulated or confidential data. |
| 314 | + |
| 315 | +# POC Plan |
| 316 | + |
| 317 | +**Stage 1** – ETA 1-2 weeks – Eng: Nick Depinet |
| 318 | + |
| 319 | +- Develop a shell (Windows Terminal extension) that communicates with |
| 320 | + ACA and can be launched from within Dev Box. |
| 321 | + |
| 322 | +- AI Toolkit Integration |
| 323 | + |
| 324 | +- **Checkpoint**: Begin collection internal developer feedback on shell |
| 325 | + functionality and integration. |
| 326 | + |
| 327 | +**Stage 2** – ETA 2-3 weeks – Eng: Sneha |
| 328 | + |
| 329 | +- Implement Agent Management Service (AMS), handle authentication, |
| 330 | + session management, and related tasks. |
| 331 | + |
| 332 | +**Stage 3** – ETA 3-4 weeks |
| 333 | + |
| 334 | +- Introduce admin controls |
| 335 | + |
| 336 | +- HOBO provisioning |
| 337 | + |
| 338 | +- Begin planning for vNet injection support as a future enhancement. |
| 339 | + |
| 340 | +**Stage 4** – ETA 4-5 weeks |
| 341 | + |
| 342 | +- Finalize portal experience integration, enabling a seamless user |
| 343 | + interface for Dev Box users to manage GPU compute access. |
| 344 | + |
| 345 | +# Open questions |
| 346 | + |
| 347 | +- What is data persistency story? |
| 348 | + |
| 349 | +- What is the user experience around handling GPU limits per user? |
| 350 | + |
| 351 | +- How do we think about GPU pooling? |
| 352 | + |
| 353 | +- Where does the session pool live in dev center infra |
| 354 | + |
| 355 | +# Rude FAQ |
| 356 | + |
| 357 | +Experience related |
| 358 | + |
| 359 | +- Why is the GPU accessible only as an external process? Why can't I use |
| 360 | + the GPU to accelerate my DevBox graphics? |
| 361 | + |
| 362 | +- Why do I have to request for GPU quota separately? Why can’t you |
| 363 | + auto-grant GPU quota to match the size of my Dev Box pool? |
| 364 | + |
| 365 | +- As an IT Admin for an Enterprise customer, why should I procure |
| 366 | + Serverless GPU through DevBox instead of directly procuring ACA |
| 367 | + Serverless GPU? |
| 368 | + |
| 369 | +Current limitations / Roadmap related |
| 370 | + |
| 371 | +- Why can I only access GPUs via Shell? Why isn't there a GUI? |
| 372 | + |
| 373 | +- Why aren't you giving me the latest generation GPUs? I really need |
| 374 | + H100s |
| 375 | + |
| 376 | +- I need multiple GPUs attached to a single DevBox, why are you making |
| 377 | + me create multiple shells which get 1 GPU each instead of giving me N |
| 378 | + GPUs in a single shell? |
| 379 | + |
| 380 | +- I want to run Windows only software such as GameMaker on Serverless |
| 381 | + GPUs. Why am I limited to Linux only? |
0 commit comments