Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 06_gpu_and_ml/llm-serving/ministral3_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@

# Native hardware support for FP8 formats in [Tensor Cores](https://modal.com/gpu-glossary/device-hardware/tensor-core)
# is limited to the latest [Streaming Multiprocessor architectures](https://modal.com/gpu-glossary/device-hardware/streaming-multiprocessor-architecture),
# like those of Modal's [Hopper H100/H200 and Blackwell B200 GPUs](https://modal.com/blog/announcing-h200-b200).
# like those of Modal's [Hopper H100/H200 and Blackwell B200 GPUs](https://modal.com/blog/introducing-b200-h200).

# At 80 GB VRAM, a single H100 GPU has enough space to store the 8B FP8 model weights (~8 GB)
# and a very large KV cache. A single H100 is also enough to serve the 14B model in full precision,
Expand Down
2 changes: 1 addition & 1 deletion 06_gpu_and_ml/llm-serving/very_large_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def _start_server() -> subprocess.Popen:
app = modal.App("example-serve-very-large-models", image=image)

# Most importantly, we need to decide what hardware to run on.
# [H200 and B200 GPUs](https://modal.com/blog/introducting-b200-h200)
# [H200 and B200 GPUs](https://modal.com/blog/introducing-b200-h200)
# have over 100 GB of [GPU RAM](https://modal.com/gpu-glossary/device-hardware/gpu-ram) --
# 141 GB and 180 GB, respectively.
# The model's weights will be stored in this memory,
Expand Down
2 changes: 1 addition & 1 deletion 06_gpu_and_ml/llm-serving/vllm_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
# We'll use an FP8 (eight-bit floating-point) post-training-quantized variant: `Qwen/Qwen3-4B-Thinking-2507-FP8`.
# Native hardware support for FP8 formats in [Tensor Cores](https://modal.com/gpu-glossary/device-hardware/tensor-core)
# is limited to the latest [Streaming Multiprocessor architectures](https://modal.com/gpu-glossary/device-hardware/streaming-multiprocessor-architecture),
# like those of Modal's [Hopper H100/H200 and Blackwell B200 GPUs](https://modal.com/blog/announcing-h200-b200).
# like those of Modal's [Hopper H100/H200 and Blackwell B200 GPUs](https://modal.com/blog/introducing-b200-h200).

# You can swap this model out for another by changing the strings below.
# A single H100 GPU has enough VRAM to store a 4,000,000,000 parameter model,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@

# ## Using WebSockets to stream audio and diarization results

# We use a Modal [ASGI](https://modal.com/docs/guide/asgi) app to serve the diarization results
# We use a Modal [ASGI](https://modal.com/docs/guide/webhooks) app to serve the diarization results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 ASGI link text doesn't match new webhooks destination

The link text still says [ASGI] but the URL was changed from /docs/guide/asgi to /docs/guide/webhooks. If the webhooks page doesn't cover ASGI specifics, this could be misleading to readers. The PR checklist notes this was reviewed and the old /docs/guide/asgi page no longer exists, so /docs/guide/webhooks is the closest valid destination. However, it may be worth updating the link text from ASGI to something like ASGI/webhooks to better match the destination page content.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

# over WebSockets. This allows us to stream the diarization results to the client in real-time.

# We use a simple queue-based architecture to handle the audio and diarization results.
Expand Down
2 changes: 1 addition & 1 deletion 13_sandboxes/sandbox_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

app = modal.App.lookup("example-sandbox-agent", create_if_missing=True)

# First, we create a custom [Image](https://modal.com/docs/images) that has Claude Code
# First, we create a custom [Image](https://modal.com/docs/guide/images) that has Claude Code
# and git installed.

image = (
Expand Down