Skip to content

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Adds GPU type detection for CUDA and Metal devices to LinearSolveAutotune's system information collection
  • Enhances telemetry reporting with detailed GPU specifications including name, memory, and compute capability
  • Improves debugging and performance analysis by providing complete hardware context

Changes

New Functions in gpu_detection.jl

  1. get_cuda_gpu_info() - Retrieves CUDA GPU information:

    • GPU name/type using CUDA.name(device)
    • Number of GPUs available
    • GPU memory in GB using CUDA.totalmem(device)
    • CUDA compute capability using CUDA.capability(device)
    • Lists all GPU types for multi-GPU systems
  2. get_metal_gpu_info() - Detects Metal GPU information:

    • Identifies Apple Silicon GPU type (M1/M2/M3/M4) based on CPU model
    • Reports GPU count

Updated Functions

  • get_system_info() - Now includes GPU fields when GPUs are detected
  • get_detailed_system_info() - Includes GPU information in detailed system data export
  • format_system_info_markdown() in telemetry.jl - Formats GPU information for display

Implementation Details

The implementation uses the CUDA.jl API to query device properties when CUDA is available. For Metal GPUs on Apple Silicon, it infers the GPU type from the CPU model since Metal.jl doesn't expose detailed device names.

The code gracefully handles:

  • Missing GPU hardware
  • Unloaded GPU packages (CUDA.jl/Metal.jl)
  • Extension loading failures

When GPUs are not available, the functions return empty dictionaries and no GPU information is added to the system info.

Testing

Created a test script that verifies:

  • System information collection works correctly
  • GPU fields are populated when GPUs are available
  • No errors occur when GPUs are absent
  • Both get_system_info() and get_detailed_system_info() include GPU data

The test successfully runs on systems with and without GPUs.

Example Output

When a CUDA GPU is available:

gpu_type: NVIDIA GeForce RTX 3090
gpu_count: 1
gpu_memory_gb: 24.0
gpu_capability: 8.6

When on Apple Silicon with Metal:

gpu_type: Apple M2 GPU
gpu_count: 1

🤖 Generated with Claude Code

This commit enhances LinearSolveAutotune's system information collection
to include detailed GPU information when CUDA or Metal GPUs are available.

Changes:
- Added `get_cuda_gpu_info()` function to retrieve CUDA GPU details:
  - GPU name/type via CUDA.name()
  - Number of GPUs
  - GPU memory in GB via CUDA.totalmem()
  - CUDA compute capability via CUDA.capability()
  - All GPU types for multi-GPU systems

- Added `get_metal_gpu_info()` function to detect Metal GPUs:
  - Infers GPU type from CPU model (M1/M2/M3/M4)
  - Reports GPU count

- Updated `get_system_info()` to include GPU information fields
- Updated `get_detailed_system_info()` to include GPU fields
- Enhanced telemetry markdown formatting to display GPU details

The implementation gracefully handles missing GPU hardware or packages,
returning empty information when GPUs are not available.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@ChrisRackauckas ChrisRackauckas merged commit 238a09f into SciML:main Aug 10, 2025
115 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants