Skip to content

Commit 05b7b63

Browse files
committed
WIP
1 parent 6ca6449 commit 05b7b63

File tree

3 files changed

+381
-0
lines changed

3 files changed

+381
-0
lines changed
Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
# Overview
2+
3+
Enterprises are increasingly looking for flexible, scalable, and
4+
cost-efficient solutions to run high-performance AI workloads.
5+
Traditional GPU provisioning often requires long-term commitments and
6+
significant upfront investments, making it challenging for organizations
7+
to optimize resources and control costs, especially for sporadic,
8+
high-intensity workloads.
9+
10+
The Dev Box Serverless GPU Compute feature addresses this challenge by
11+
integrating Microsoft Dev Box with Azure Container Apps (ACA), enabling
12+
on-demand access to powerful GPU resources without requiring long-term
13+
provisioning. Developers can dynamically allocate GPU power within their
14+
Dev Box based on the demands of their AI tasks, such as model training,
15+
fine-tuning, and data preprocessing.
16+
17+
Beyond compute flexibility, Dev Box also provides a secure development
18+
environment for AI workloads that require access to sensitive corporate
19+
data. Many enterprises need to train models on proprietary datasets that
20+
are restricted by network-layer security policies. Since Dev Box is
21+
already embedded within an organization’s secure network and governance
22+
framework, it enables AI engineers to access and process protected data
23+
while ensuring compliance with corporate security standards.
24+
25+
This integration delivers a unique combination of flexibility, security,
26+
and cost optimization, ensuring that enterprises can scale GPU resources
27+
efficiently while maintaining tight control over data access and
28+
compliance. By eliminating the complexities of provisioning and securing
29+
AI development environments, Dev Box enables developers to focus on
30+
innovation rather than infrastructure management.
31+
32+
# Architecture
33+
34+
The Dev Box Serverless GPU Compute feature leverages a tight integration
35+
with Azure Container Apps (ACA) to provide on-demand, high-performance
36+
GPU compute for AI workloads attached to the customer’s private network.
37+
This architecture is designed to be seamless for developers, enabling
38+
powerful compute resources without the need for manual setup or
39+
long-term provisioning.
40+
41+
## Integration with Azure Container Apps (ACA)
42+
43+
At the core of the Dev Box serverless GPU compute solution is the
44+
integration with Azure Container Apps Serverless GPU. This integration
45+
ensures that developers can access GPU resources on-demand, scaling as
46+
required by their AI workloads. ACA abstracts the complexity of GPU
47+
provisioning, allowing Dev Box to handle resource allocation and usage
48+
automatically without requiring intervention from the developer.
49+
50+
- **Seamless User Experience**: With this integration, users will
51+
interact with Dev Box as usual, without needing to be aware that Azure
52+
Container Apps is behind the scenes nor creating any resources or
53+
connections themselves. GPU resources will be allocated dynamically as
54+
part of the Dev Box infrastructure, abstracting the ACA technology and
55+
setup away from the developer.
56+
57+
- **MOBO Architecture Model**: We will adopt the MOBO architecture model
58+
for ACA integration. In this model, ACA instances will be created and
59+
managed within the customer’s subscription, providing a more
60+
controlled and streamlined management experience for the customers.
61+
The dev box service can effectively and securely manage ACA session
62+
Box without introducing additional complexity.
63+
64+
## GPU Hardware Availability
65+
66+
ACA currently supports two primary GPU options for AI workloads:
67+
68+
- NVIDIA T4 GPUs – Readily available with minimal quota concerns
69+
70+
- NVIDIA A100 GPUs – More powerful but available in limited capacity
71+
72+
These GPU resources are currently available in four Azure regions:
73+
74+
- West US 3
75+
76+
- Sweden North
77+
78+
- Australia East
79+
80+
While the initial rollout focuses on these locations, ACA’s GPU support
81+
can be expanded into additional regions based on demand. The v0
82+
integration will only support T4 GPUs
83+
84+
## Consideration for vNet Injection
85+
86+
We recognize that vNet injection will likely be a common customer ask.
87+
vNet injection will allow customers to integrate their network and
88+
security protocols with the serverless GPU environment. Although this
89+
capability is not a requirement for the POC, it will be prioritized for
90+
public previews and general availability (GA). We will ensure that with
91+
vNet injection, customers can leverage vNet injection for tighter
92+
control over network and security configurations.
93+
94+
## Enabling Serverless GPUs at the Project Level
95+
96+
Serverless GPUs will be enabled per project using Dev Center Project
97+
Policies. This allows administrators to define and control which
98+
projects within an organization can access GPU resources, ensuring that
99+
GPU usage is in line with organizational requirements and budget
100+
considerations. See [admin controls](#_Admin_Controls) section for
101+
details on specific configurations.
102+
103+
## Access Control and Serverless GPU Granting
104+
105+
Access to serverless GPU resources in Dev Box will be managed through
106+
project-level properties. When the serverless GPU feature is enabled for
107+
a project, all Dev Boxes within that project will automatically have
108+
access to GPU compute.
109+
110+
This shift simplifies the access model by removing the need for custom
111+
roles or pool-based configurations. Instead, GPU access is now governed
112+
centrally through a project properties. Future iterations of project Dev
113+
Center’s **project policy infrastructure**.
114+
115+
For more information on how admins can enable this feature, define GPU
116+
types, and set per-user limits, see the **Admin Controls** section.
117+
118+
# Developer Experience
119+
120+
The goal of the Developer Experience for Dev Box Serverless GPU Compute
121+
is to make accessing GPU resources seamless and native, with no setup
122+
required from the developer. The aim is to create a new kind of shell
123+
that has built-in access to GPU compute via an ACA session. This shell
124+
will be available across platforms like Windows Terminal, Visual Studio,
125+
and VS Code in a native, in-box experience.
126+
127+
## Shell Extension for Windows Terminal
128+
129+
Windows Terminal serves as a terminal emulator for different kinds of
130+
shells. To enable GPU access, we will introduce a new shell, tentatively
131+
called "DevBoxGPU Shell". This shell will be connected to a serverless
132+
GPU ACA session, allowing developers to run GPU-powered workloads
133+
directly from the terminal.
134+
135+
- When a new shell instance is launched, an ACA session will start
136+
running in the background, providing GPU access.
137+
138+
- The ACA instance will remain active as long as the shell is open, and
139+
resource usage will be billed accordingly.
140+
141+
- Once the shell is closed, the ACA instance will automatically shut
142+
down, stopping any further resource usage and billing.
143+
144+
This ensures that developers have access to GPU resources with zero
145+
manual configuration, providing a clean and efficient workflow.
146+
147+
<img src="media/concept-serverless-gpu-v2/image1.png" style="width:6.5in;height:4.19167in"
148+
alt="A screenshot of a computer program AI-generated content may be incorrect." />
149+
150+
## Visual Studio
151+
152+
Since Visual Studio hosts Windows Terminal natively and can expose
153+
various shells, it allows us to extend this seamless GPU access directly
154+
within the IDE. By creating GPU-powered shells within Visual Studio,
155+
developers will be able to launch GPU-intensive tasks directly from
156+
their development environment, further streamlining their workflow:
157+
158+
<img src="media/concept-serverless-gpu-v2/image2.png" style="width:6.5in;height:2.05417in"
159+
alt="A screen shot of a computer AI-generated content may be incorrect." />
160+
161+
## AI Toolkit for VS Code
162+
163+
The AI Toolkit for VS Code provides a rich ecosystem for AI development
164+
as a VS Code extension, including fine-tuning, inference, and an
165+
integrated model marketplace. Dev Box Serverless GPU Compute will
166+
seamlessly integrate with the AI Toolkit’s ACA-based backend, enabling
167+
developers to:
168+
169+
- Instantly access serverless GPUs for AI workloads without additional
170+
setup.
171+
172+
- Utilize the AI Toolkit’s model marketplace to select and deploy AI
173+
models efficiently.
174+
175+
- Leverage built-in fine-tuning and inference capabilities powered by
176+
ACA.
177+
178+
- Use an integrated playground to test and iterate on AI models in
179+
real-time.
180+
181+
This integration ensures that developers can take advantage of
182+
serverless GPU compute provided via Dev Box directly within VS Code,
183+
making AI development more accessible and frictionless.
184+
185+
## Multiple Shell Instances
186+
187+
From an architectural standpoint, there are several options regarding
188+
how new instances of the DevBoxGPU Shell can interact with ACA sessions.
189+
Below are the key options we are considering:
190+
191+
- **Option 1**: Multiple instances of the DevBoxGPU Shell share a single
192+
ACA session. In this setup, the same GPU is allocated across multiple
193+
shell instances, allowing them to share GPU compute resources.
194+
195+
- **Option 2**: Each new instance of the DevBoxGPU Shell is assigned to
196+
a separate ACA session, with each instance having its own dedicated
197+
GPU. This means that a user can access multiple GPUs simultaneously by
198+
running separate instances of the shell. For POC purposes, we will
199+
pursue this option.
200+
201+
- **Option 3**: The system allocates dedicated GPUs to each instance of
202+
the DevBoxGPU Shell until the user’s maximum GPU allocation is
203+
reached. After this limit is hit, additional shell instances will
204+
begin sharing GPU compute across sessions.
205+
206+
For the POC, we will pursue **Option 2**, where each shell instance gets
207+
its own dedicated ACA session and GPU, ensuring clear isolation of
208+
resources.
209+
210+
# Admin controls
211+
212+
## Project Policies
213+
214+
Serverless GPU access is controlled through **project properties**.
215+
Admins will be able to manage serverless GPU settings via API or a
216+
forthcoming **Project Configuration** blade in the portal.
217+
218+
Key capabilities include:
219+
220+
- **Enable/Disable GPU Access**: Serverless GPU compute can be toggled
221+
at the project level through a dedicated property.
222+
223+
- **Set Max Concurrent GPU Count**: Each project can specify the maximum
224+
number of GPUs that can be used concurrently across all Dev Boxes in
225+
that project. This acts as a soft cap for total GPU usage, helping
226+
control overall consumption.
227+
228+
Because only T4 will be available for v0
229+
230+
*Note: While project policies (as known today) do not directly govern
231+
GPU access, future enhancements will integrate project policies more
232+
tightly with these GPU properties, enabling better governance and
233+
centralized enforcement.*
234+
235+
## Additional Cost Controls
236+
237+
For Proof of Concept (POC) purposes, subscription quota will be utilized
238+
for cost management. This means the overall GPU usage across projects
239+
will be managed within a user’s subscription limits. However, as the
240+
feature evolves, we may need to consider per-project GPU quotas at the
241+
project policy level to provide further granularity and control over
242+
costs.
243+
244+
## Image Management
245+
246+
Each ACA instance will be tied to a Linux image. While ACA provides a
247+
broad set of pre-configured images, we anticipate that Dev Box customers
248+
may prefer to use their own custom images to better meet their specific
249+
requirements. To support this, we are evaluating options for custom
250+
image management.
251+
252+
One current option is to bring your own image by providing an Azure
253+
Container Registry (ACR) that contains the desired image. This would
254+
allow admins to upload and manage custom images for use within ACA.
255+
256+
For the POC purposes, we will utilize ACA’s pre-canned images
257+
(<https://learn.microsoft.com/en-us/azure/container-apps/sessions-code-interpreter#preinstalled-packages>).
258+
259+
# Scenarios
260+
261+
The Dev Box Serverless GPU Compute feature is designed to support a wide
262+
range of CLI-driven tasks that benefit from on-demand, high-performance
263+
compute. This flexibility allows developers to run a variety of
264+
compute-intensive workflows without the need for dedicated GPU
265+
infrastructure. Some key scenarios include:
266+
267+
- **AI Model Training and Inference**: On-demand GPU access for tasks
268+
like training large models, fine-tuning, and running inference
269+
workloads.
270+
271+
- **Data Processing and Preprocessing**: Accelerated data manipulation
272+
and transformation for large datasets.
273+
274+
- **High-Performance Computing (HPC)**: Support for simulations,
275+
scientific computations, and other resource-intensive tasks.
276+
277+
- **Cloud-Native Development**: Scaling GPU resources for cloud-native,
278+
containerized workflows in AI and beyond.
279+
280+
- **CLI-Based Workflows**: Developers can leverage GPUs for any other
281+
CLI-based task that benefits from intensive compute, whether for AI,
282+
simulations, or other specialized domains.
283+
284+
# Why Dev Box?
285+
286+
Dev Box brings several key advantages to enterprises looking to leverage
287+
serverless GPU compute for AI and other compute-heavy tasks:
288+
289+
- **No Need for Resource Creation Permissions**: In many enterprises,
290+
developers lack access to the broader cloud infrastructure or the
291+
permissions required to create and manage GPU resources like ACA
292+
instances. With Dev Box, developers can access serverless GPU compute
293+
without needing to manage or create the underlying resources
294+
themselves.
295+
296+
- **Instant Access to GPU Compute**: Dev Box allows developers to get up
297+
and running with serverless GPU compute with just a single click.
298+
There's no need for manual configuration or setup, ensuring developers
299+
can focus on their work rather than worrying about infrastructure.
300+
301+
- **Centralized Control for Admins**: Dev Box integrates seamlessly with
302+
Dev Center's project policies, giving administrators granular control
303+
over serverless GPU access. Admins can define consumption limits,
304+
enable or disable GPU access on a per-project basis, and set
305+
permissions for users, all within the familiar Dev Center
306+
infrastructure.
307+
308+
- **Secure Private Network Integration**: Dev Box runs within a private,
309+
enterprise-managed network. This ensures that sensitive corporate data
310+
used for AI workloads—such as proprietary models, internal datasets,
311+
or compliance-bound information—remains isolated and secure at the
312+
network layer. This added layer of security is crucial for enterprises
313+
handling regulated or confidential data.
314+
315+
# POC Plan
316+
317+
**Stage 1** – ETA 1-2 weeks – Eng: Nick Depinet
318+
319+
- Develop a shell (Windows Terminal extension) that communicates with
320+
ACA and can be launched from within Dev Box.
321+
322+
- AI Toolkit Integration
323+
324+
- **Checkpoint**: Begin collection internal developer feedback on shell
325+
functionality and integration.
326+
327+
**Stage 2** – ETA 2-3 weeks – Eng: Sneha
328+
329+
- Implement Agent Management Service (AMS), handle authentication,
330+
session management, and related tasks.
331+
332+
**Stage 3** – ETA 3-4 weeks
333+
334+
- Introduce admin controls
335+
336+
- HOBO provisioning
337+
338+
- Begin planning for vNet injection support as a future enhancement.
339+
340+
**Stage 4** – ETA 4-5 weeks
341+
342+
- Finalize portal experience integration, enabling a seamless user
343+
interface for Dev Box users to manage GPU compute access.
344+
345+
# Open questions
346+
347+
- What is data persistency story?
348+
349+
- What is the user experience around handling GPU limits per user?
350+
351+
- How do we think about GPU pooling?
352+
353+
- Where does the session pool live in dev center infra
354+
355+
# Rude FAQ
356+
357+
Experience related
358+
359+
- Why is the GPU accessible only as an external process? Why can't I use
360+
the GPU to accelerate my DevBox graphics?
361+
362+
- Why do I have to request for GPU quota separately? Why can’t you
363+
auto-grant GPU quota to match the size of my Dev Box pool?
364+
365+
- As an IT Admin for an Enterprise customer, why should I procure
366+
Serverless GPU through DevBox instead of directly procuring ACA
367+
Serverless GPU?
368+
369+
Current limitations / Roadmap related
370+
371+
- Why can I only access GPUs via Shell? Why isn't there a GUI?
372+
373+
- Why aren't you giving me the latest generation GPUs? I really need
374+
H100s
375+
376+
- I need multiple GPUs attached to a single DevBox, why are you making
377+
me create multiple shells which get 1 GPU each instead of giving me N
378+
GPUs in a single shell?
379+
380+
- I want to run Windows only software such as GameMaker on Serverless
381+
GPUs. Why am I limited to Linux only?
359 KB
Loading
22.6 KB
Loading

0 commit comments

Comments
 (0)