Implement flock-based bitmap GPU allocator for CI parallelization by Copilot · Pull Request #370 · ROCm/iris

Copilot · 2026-02-14T02:11:33Z

Summary

Implements a lightweight, flock-based GPU allocator for CI parallelization on our 8-GPU runner. Enables dynamic GPU allocation with variable requests (1, 2, 4, 8 GPUs), eliminating hardcoded assignments and enabling true parallel execution.

Key Features

GPU Allocator Core (.github/scripts/gpu_allocator.sh)

Bitmap-based allocation: 8-bit bitmap (0-255) tracks GPU state atomically via flock on /tmp/iris_gpu_state
Non-contiguous allocation: Any N free GPUs can be used (no contiguous requirement)
Out-of-order release safe: Jobs can finish in any order without deadlocks or GPU waste
Retry loop: 1-minute backoff, up to 3 hours total wait time (180 retries × 60s)
State validation: Detects and resets corrupted or out-of-range bitmap values
Handles set -e: Works correctly in scripts with errexit enabled
Diagnostic logging: Configuration and allocation status for troubleshooting
Throughput-oriented: First-fit scheduling (non-FIFO acceptable)

Workflow-Level GPU Management

acquire_gpus.sh: Acquires GPUs and exports GPU_DEVICES to GITHUB_ENV
release_gpus.sh: Releases GPUs via bitmap clearing with if: always() guarantee
All workflows have dedicated acquire/release GPU steps with 3-hour timeouts
Release always runs regardless of test outcome

Distributed Test Runner (tests/run_tests_distributed.py)

Simple worker script that runs pytest under torchrun
Invoked directly by torchrun (no recursive launcher pattern)
Eliminates EADDRINUSE port conflicts through torchrun's automatic port management

Performance Benchmarks (examples/)

3 benchmarks support both torchrun and mp.spawn modes:
- examples/07_gemm_all_scatter/benchmark.py
- examples/11_gemm_all_scatter_producer_consumer/benchmark.py
- examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py
Auto-detects execution mode via RANK and LOCAL_RANK environment variables
Maintains backward compatibility for standalone execution

External Validation Tests

Invoke torchrun directly (avoids nested process issues)
Both tests use: torchrun --nproc_per_node=2 <test_file>

Test Runner Integration

Reads GPU_DEVICES from environment (set by workflow)
Invokes torchrun directly for distributed tests
No internal GPU allocation logic

Workflow Simplification

Removed hardcoded gpu_devices from all workflow matrices
Removed port cleanup steps (obsolete with torchrun)
Jobs acquire GPUs dynamically for true parallel execution
All jobs have 3-hour timeouts matching GPU allocator wait time

Implementation Details

Bitmap Allocator

State file: 8-bit integer (0-255) representing GPU allocation bitmap
Bit position N = GPU N (bit 0 = GPU 0, bit 1 = GPU 1, etc.)
Bit value: 1 = allocated, 0 = free
Allocation: finds N free GPUs (bits=0), sets bits to 1
Release: clears bits using bitwise AND with inverse mask
ALLOCATED_GPU_BITMAP env var stores allocation mask for cleanup

Example bitmap states:

0b00000000 (0): All GPUs free
0b00000011 (3): GPUs 0,1 allocated
0b00011111 (31): GPUs 0-4 allocated
0b11111111 (255): All 8 GPUs allocated

Advantages over index-based allocation:

Out-of-order job completion safe (no deadlocks)
No GPU waste from out-of-order releases
Immediate GPU reuse when freed
Better utilization with variable GPU requests

Impact

Before: Sequential job execution, static GPU assignment, port conflicts, potential deadlocks from out-of-order completion

After: Parallel execution constrained only by GPU availability. A 1-GPU job and 4-GPU job can run concurrently with proper isolation. Jobs complete in any order without blocking. All distributed components use torchrun for automatic port management.

Original prompt

This section details on the original issue you should resolve

<issue_title>Implement Lightweight GPU Allocator for CI (Supports Variable GPU Requests)</issue_title>
<issue_description>We need a lightweight GPU isolation mechanism for CI jobs running on our single-node 8-GPU machine.

Requirements:

Support variable GPU requests (e.g., 1, 2, 4, 8 GPUs)

Ensure no overlapping GPU usage

Maximize machine utilization

Avoid heavy infrastructure (e.g., Slurm)

Throughput-oriented scheduling (non-FIFO is acceptable)

Design:

Use a single global flock to protect allocation state

Maintain a shared state file tracking free GPUs

When a job requests N GPUs:

If enough GPUs are free → allocate immediately

Otherwise → release lock, sleep, and retry

On job exit, GPUs are released back into the pool

This approach provides deterministic isolation and high utilization without introducing a full scheduler.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Implement Lightweight GPU Allocator for CI (Supports Variable GPU Requests) #369

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

…idation Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

.github/scripts/GPU_ALLOCATOR.md

…ic GPUs Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot

Pull request overview

Introduces a lightweight, flock-based GPU allocation mechanism for CI so matrix jobs can dynamically acquire the number of GPUs they need, rather than relying on hardcoded GPU assignments in the workflow.

Changes:

Added a shared-state GPU allocator script that blocks/retries until enough GPUs are available and exports GPU_DEVICES for HIP_VISIBLE_DEVICES.
Updated the test runner script to automatically acquire/release GPUs when not explicitly provided.
Simplified the CI workflow matrix by removing static gpu_devices entries and always letting the runner auto-allocate.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`.github/workflows/iris-tests.yml`	Removes hardcoded GPU device mappings from the test matrices and relies on dynamic allocation.
`.github/scripts/run_tests.sh`	Sources the allocator and acquires/releases GPUs automatically when devices aren’t provided.
`.github/scripts/gpu_allocator.sh`	Implements the `flock`-protected shared state allocator and cleanup logic.

.github/scripts/gpu_allocator.sh

.github/scripts/run_tests.sh

mawad-amd · 2026-02-14T02:52:45Z

@copilot apply changes based on the comments in this thread

… state validation - Remove top-level set -e to avoid side effects when sourced - Add state file validation in init_gpu_state (handles invalid/out-of-range values) - Fix race condition in acquire_gpus by capturing start index while holding lock - Change out-of-order release to preserve state instead of resetting to 0 (prevents GPU overlap) - Make cleanup trap optional and composable via enable_gpu_cleanup_trap() - Fix GPU_DEVICES env variable handling in run_tests.sh to honor environment - Add shellcheck disable for intentional trap expansion Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot · 2026-02-14T02:56:44Z

@copilot apply changes based on the comments in this thread

Applied all review feedback. Changes in commit eb6bae9:

Removed set -e from sourced file (no side effects)
Added state validation in init_gpu_state (detects/resets invalid/out-of-range values)
Fixed race condition in acquire_gpus (captures start index atomically via temp file)
Changed out-of-order release to preserve state instead of resetting to 0 (prevents GPU overlap)
Made cleanup trap optional via enable_gpu_cleanup_trap() (composes with existing traps)
Fixed GPU_DEVICES env variable handling in run_tests.sh (honors environment, arg overrides)

All changes tested and verified.

Changed error message from "Need GPUs 8-8 but only 0-7 available" to "Not enough GPUs: need 1, only 0 available (next free GPU: 8)" which is much clearer about how many GPUs are needed vs available. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

… references **Changes:** 1. Removed unused `enable_gpu_cleanup_trap()` function from gpu_allocator.sh (user request) 2. Fixed release_gpus.sh to check only variables actually set by acquire_gpus.sh: - Changed from checking ALLOCATED_GPU_START and ALLOCATED_GPU_COUNT - To checking ALLOCATED_GPU_BITMAP (the actual variable exported) - Removed logging of undefined variables These changes clean up unused code and fix bugs in release_gpus.sh where it was checking/logging variables that were never set by the allocator. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

.github/workflows/iris-external-validation-test.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

**Changes:** 1. Removed `--worker-mode` dummy argument from run_tests_distributed.py - The argument was not handled by pytest and could cause warnings - Detection via environment variables (RANK/WORLD_SIZE) is sufficient 2. Set timeout-minutes to 180 (3 hours) for all workflow jobs - iris-tests.yml: Added timeout to build-container-image, test-git, test-editable - iris-performance-regression-test.yml: Updated from 20/30 to 180 minutes - iris-external-validation-test.yml: Added timeout to build-container-image, external-validation-test - Aligns with GPU allocator's 3-hour wait time for consistency These changes ensure workflows have sufficient time to wait for GPUs and remove unnecessary dummy arguments from the test runner. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Refactored run_tests_distributed.py to remove recursive launcher pattern: - Removed launcher logic that parsed arguments and spawned torchrun - Script now runs directly as a torchrun worker - Takes pytest arguments directly from command line (no environment variable) Updated run_tests.sh to invoke torchrun directly: - Changed from: python run_tests_distributed.py --num_ranks N <test> <args> - Changed to: torchrun --nproc_per_node=N --standalone run_tests_distributed.py <test> <args> Benefits: - Simpler, more direct execution path - No recursive script invocation - Easier to understand and debug - Eliminates dummy arguments and environment variable passing Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

Copilot · 2026-02-15T05:10:10Z

tests/run_tests_distributed.py

+# torchrun sets these environment variables automatically
+rank = int(os.environ.get("RANK", 0))
+world_size = int(os.environ.get("WORLD_SIZE", 1))
+local_rank = int(os.environ.get("LOCAL_RANK", 0))
+
+# Set the correct GPU for this specific process
+if torch.cuda.is_available():
+    torch.cuda.set_device(local_rank)
+
+# Initialize distributed - torchrun already set up the environment
+dist.init_process_group(
+    backend="nccl",
+    rank=rank,
+    world_size=world_size,
+    device_id=torch.device(f"cuda:{local_rank}") if torch.cuda.is_available() else None,
+)


run_tests_distributed.py now assumes it is always launched via torchrun (env:// rendezvous). Running it directly (e.g., python tests/run_tests_distributed.py ...) will fail because MASTER_ADDR/MASTER_PORT are not set and there's no fallback. Consider adding a guard that either (a) errors with a clear message telling users to run via torchrun, or (b) retains the previous mp.spawn fallback when RANK/WORLD_SIZE/LOCAL_RANK are missing.

Copilot · 2026-02-15T05:10:10Z

tests/run_tests_distributed.py

+# Initialize distributed - torchrun already set up the environment
+dist.init_process_group(
+    backend="nccl",
+    rank=rank,
+    world_size=world_size,
+    device_id=torch.device(f"cuda:{local_rank}") if torch.cuda.is_available() else None,
+)


The process-group backend is hard-coded to nccl. If this script is ever executed in an environment without CUDA/HIP available, dist.init_process_group(backend="nccl") will fail. Consider selecting backend="nccl" if torch.cuda.is_available() else "gloo" (matching the pattern used elsewhere in the repo).

Copilot · 2026-02-15T05:10:10Z

.github/scripts/run_tests.sh

+# Run Iris tests in a container with automatic GPU allocation
 # Usage: run_tests.sh <test_dir> <num_ranks> [gpu_devices] [install_method]
 #   test_dir: subdirectory under tests/ (e.g., examples, unittests, ccl)
 #   num_ranks: number of GPU ranks (1, 2, 4, or 8)
-#   gpu_devices: comma-separated GPU device IDs (optional)
+#   gpu_devices: comma-separated GPU device IDs (optional, if not provided will use allocator)
 #   install_method: pip install method - "git", "editable", or "install" (optional, default: "editable")


The header comment says this script provides "automatic GPU allocation" / "will use allocator" when gpu_devices isn't provided, but the implementation only warns when GPU_DEVICES is empty and never calls the allocator. Suggest updating the header/usage text to reflect the actual behavior (workflow-level acquire/release), or implement the documented fallback.

Copilot · 2026-02-15T05:10:10Z

.github/scripts/run_perf_benchmark.sh

@@ -40,10 +44,9 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    cd /iris_workspace
    pip install -e .
-    python examples/${EXAMPLE_PATH}/benchmark.py \
+    torchrun --nproc_per_node=8 examples/${EXAMPLE_PATH}/benchmark.py \
        --benchmark \
        --validate \
-        -r 8 \
        ${BENCHMARK_ARGS} \
        --output_file perf_result.json


GPU_DEVICES can be overridden via the environment, but the benchmark always runs torchrun --nproc_per_node=8. If GPU_DEVICES is set to fewer than 8 devices, this will oversubscribe and fail. Consider deriving --nproc_per_node from the requested/available GPU count (e.g., count of comma-separated IDs), or enforce that GPU_DEVICES must contain 8 GPUs when using this script.

Copilot · 2026-02-15T05:10:11Z

examples/11_gemm_all_scatter_producer_consumer/benchmark.py

+        world_size = int(os.environ.get("WORLD_SIZE", 1))
+        init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")
+        _worker(rank, world_size, f"tcp://{init_url}", args)


In torchrun mode, this passes rank=int(os.environ["RANK"]) into _worker, but _worker uses that parameter as local_rank for rank= and device_id=torch.device(f"cuda:{local_rank}"). That’s only correct when global rank == local rank. Please use LOCAL_RANK for the first _worker argument / device selection, and keep RANK for the global rank.

Suggested change

world_size = int(os.environ.get("WORLD_SIZE", 1))

init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")

_worker(rank, world_size, f"tcp://{init_url}", args)

local_rank = int(os.environ.get("LOCAL_RANK", 0))

world_size = int(os.environ.get("WORLD_SIZE", 1))

init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")

_worker(local_rank, world_size, f"tcp://{init_url}", args)

Copilot · 2026-02-15T05:10:11Z

.github/scripts/gpu_allocator.sh

+            # Ensure the bitmap is within valid range (0-255 for 8 GPUs)
+            elif [ "$current_state" -lt 0 ] || [ "$current_state" -gt 255 ]; then
+                echo "0" > "$GPU_STATE_FILE"
+                echo "[GPU-ALLOC] Detected out-of-range GPU bitmap ($current_state); reset to 0" >&2


State validation hard-codes the max bitmap value to 255, but the allocator also supports overriding MAX_GPUS. If MAX_GPUS is changed, this range check becomes incorrect. Consider computing the max as (1 << MAX_GPUS) - 1 to keep validation consistent with configuration.

Copilot · 2026-02-15T05:10:12Z

.github/workflows/iris-external-validation-test.yml

+      - name: Acquire GPUs
        run: |
-          bash .github/scripts/cleanup_ports.sh
+          bash .github/scripts/acquire_gpus.sh 2



This job acquires GPUs (sets GPU_DEVICES), but the subsequent container invocation in this workflow does not pass --gpus "$GPU_DEVICES", so the container will still see all GPUs and can overlap with other jobs. Update the container step to consume the allocated GPU_DEVICES (same pattern as external-gluon-validation-test).

Copilot · 2026-02-15T05:10:12Z

examples/07_gemm_all_scatter/benchmark.py

+        rank = int(os.environ.get("RANK", 0))
+        world_size = int(os.environ.get("WORLD_SIZE", 1))
+        init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")
+        _worker(rank, world_size, f"tcp://{init_url}", args)


In torchrun mode, this uses RANK as the first argument to _worker, but _worker treats that parameter as the per-process GPU index (local_rank) and uses it for device_id. For multi-node torchrun (or any case where global rank != local rank), this will select the wrong device. Use LOCAL_RANK for device selection / _worker's first argument, and use RANK only for the global rank passed to the process group.

Suggested change

rank = int(os.environ.get("RANK", 0))

world_size = int(os.environ.get("WORLD_SIZE", 1))

init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")

_worker(rank, world_size, f"tcp://{init_url}", args)

local_rank = int(os.environ.get("LOCAL_RANK", 0))

world_size = int(os.environ.get("WORLD_SIZE", 1))

init_url = os.environ.get("MASTER_ADDR", "127.0.0.1") + ":" + os.environ.get("MASTER_PORT", "29500")

_worker(local_rank, world_size, f"tcp://{init_url}", args)

Copilot · 2026-02-15T05:10:12Z

.github/scripts/gpu_allocator.sh

+        if [ ! -f "$GPU_STATE_FILE" ]; then
+            # Initialize with all GPUs free (bitmap = 0)
+            echo "0" > "$GPU_STATE_FILE"
+            echo "[GPU-ALLOC] Initialized GPU bitmap: 0 (all GPUs free)" >&2


The GPU state file is created in /tmp via echo "0" > "$GPU_STATE_FILE" without setting permissions. On multi-user runners this can allow unintended modification/corruption of allocator state. Consider creating it with restrictive permissions (e.g., umask 077 for the write, or chmod 600 "$GPU_STATE_FILE" after creation).

Copilot · 2026-02-15T05:10:12Z

.github/scripts/release_gpus.sh

+    echo "[RELEASE-GPUS] No GPU allocation found, nothing to release"
+    exit 0
+fi
+


release_gpus.sh proceeds even if ALLOCATED_GPU_BITMAP is missing (as long as GPU_DEVICES is set), but release_gpus uses only the bitmap mask. In that scenario, the script will log details and then effectively release nothing, leaking allocations in the shared bitmap. Consider either requiring ALLOCATED_GPU_BITMAP to be set (fail fast), or reconstructing the mask from GPU_DEVICES as a fallback.

Suggested change

# Fail fast if GPU_DEVICES is set but the bitmap mask is missing

if [ -n "$GPU_DEVICES" ] && [ -z "$ALLOCATED_GPU_BITMAP" ]; then

echo "[RELEASE-GPUS] ERROR: GPU_DEVICES is set but ALLOCATED_GPU_BITMAP is missing; cannot safely release GPUs"

exit 1

fi

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Initial plan

8195a29

Copilot AI assigned Copilot and mawad-amd Feb 14, 2026

Copilot started work on behalf of mawad-amd February 14, 2026 02:12 View session

Copilot AI and others added 3 commits February 14, 2026 02:15

Implement GPU allocator with flock-based synchronization

5ff45c3

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Fix shellcheck warnings and add GPU allocator documentation

f6d8779

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Address code review feedback: fix exit code capture and add input val…

a848f92

…idation Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement lightweight GPU allocator for CI with variable requests~~ Implement flock-based GPU allocator for CI parallelization Feb 14, 2026

Copilot AI requested a review from mawad-amd February 14, 2026 02:21

Copilot finished work on behalf of mawad-amd February 14, 2026 02:21

mawad-amd reviewed Feb 14, 2026

View reviewed changes

.github/scripts/GPU_ALLOCATOR.md Outdated Show resolved Hide resolved

Copilot started work on behalf of mawad-amd February 14, 2026 02:22 View session

Simplify GPU allocator: use single counter instead of tracking specif…

122b737

…ic GPUs Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI requested a review from mawad-amd February 14, 2026 02:26

Copilot finished work on behalf of mawad-amd February 14, 2026 02:26

Copilot started work on behalf of mawad-amd February 14, 2026 02:32 View session

Set GPU_DEVICES with starting index; ensure cleanup works properly

41bc7cf

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot finished work on behalf of mawad-amd February 14, 2026 02:36

Copilot started work on behalf of mawad-amd February 14, 2026 02:38 View session

Copilot finished work on behalf of mawad-amd February 14, 2026 02:41

mawad-amd requested a review from Copilot February 14, 2026 02:42

Copilot started reviewing on behalf of mawad-amd February 14, 2026 02:43 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

Copilot started work on behalf of mawad-amd February 14, 2026 02:53 View session

Copilot finished work on behalf of mawad-amd February 14, 2026 02:57

Copilot started work on behalf of mawad-amd February 14, 2026 03:06 View session

mawad-amd and others added 3 commits February 14, 2026 20:42

Apply suggestions from code review

b278876

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply Ruff auto-fixes

b4338ad

mawad-amd reviewed Feb 15, 2026

View reviewed changes

.github/workflows/iris-external-validation-test.yml Show resolved Hide resolved

Copilot AI requested a review from mawad-amd February 15, 2026 04:45

Update examples/07_gemm_all_scatter/benchmark.py

da8042d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot finished work on behalf of mawad-amd February 15, 2026 04:45

Update examples/11_gemm_all_scatter_producer_consumer/benchmark.py

3e22bff

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of mawad-amd February 15, 2026 04:45 View session

Copilot finished work on behalf of mawad-amd February 15, 2026 04:48

Copilot started work on behalf of mawad-amd February 15, 2026 04:49 View session

Copilot finished work on behalf of mawad-amd February 15, 2026 04:51

Copilot started work on behalf of mawad-amd February 15, 2026 04:54 View session

Copilot finished work on behalf of mawad-amd February 15, 2026 04:55

Copilot started work on behalf of mawad-amd February 15, 2026 04:56 View session

Copilot finished work on behalf of mawad-amd February 15, 2026 04:58

mawad-amd requested a review from Copilot February 15, 2026 05:04

Copilot started reviewing on behalf of mawad-amd February 15, 2026 05:05 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

mawad-amd marked this pull request as ready for review February 15, 2026 07:55

mawad-amd requested review from BKP and neoblizz as code owners February 15, 2026 07:55

mawad-amd merged commit 75508f6 into main Feb 15, 2026
80 checks passed

mawad-amd deleted the copilot/implement-lightweight-gpu-allocator branch February 15, 2026 07:55

mawad-amd mentioned this pull request Feb 15, 2026

Test suite analysis: 530K tests, parallelism-aware optimization plan (61% reduction possible) #354

Draft

Copilot AI added a commit that referenced this pull request Feb 15, 2026

Add PR #370 parallelization analysis confirming 1.7 hour CI time

b6ec421

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd mentioned this pull request Feb 15, 2026

Test Suite Optimization Phase 1: Reduce Parametrization in Critical Path #373

Open

Copilot AI mentioned this pull request Feb 15, 2026

Reduce test parametrization by 58% in unittests + CCL for CI optimization #374

Draft

12 tasks

+# Fail fast if GPU_DEVICES is set but the bitmap mask is missing
+if [ -n "$GPU_DEVICES" ] && [ -z "$ALLOCATED_GPU_BITMAP" ]; then
+    echo "[RELEASE-GPUS] ERROR: GPU_DEVICES is set but ALLOCATED_GPU_BITMAP is missing; cannot safely release GPUs"
+    exit 1
+fi

Conversation

Copilot AI commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Implementation Details

Impact

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mawad-amd commented Feb 14, 2026

Uh oh!

Copilot AI commented Feb 14, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 14, 2026 •

edited

Loading