A diagnostic and repair script that resolves the amdgpu kernel module blacklisting issue preventing ROCm from initializing on AMD Strix Halo APUs, part of the Ryzen AI 300 (Max, Max+) series. Tested on a Ryzen AI Max+ 395 based system.
The AMD Ryzen AI 300 series (codenamed "Strix Halo") represents AMD's high-performance mobile APU lineup combining CPU, GPU, and NPU on a single chip with unified memory architecture. The Ryzen AI Max+ 395 is the flagship configuration featuring:
| Component | Specification |
|---|---|
| CPU | 16 Zen 5 cores (32 threads) |
| iGPU | Radeon 8060S - RDNA 3.5 architecture, 40 CUs, gfx1151 target |
| NPU | XDNA 2 architecture, 50 TOPS AI performance |
| Memory | Unified memory architecture - CPU, GPU, and NPU share up to 128GB system RAM |
The unified memory architecture is particularly significant for AI/ML workloads, as the iGPU can access the full system memory pool without PCIe bandwidth limitations. This makes ROCm functionality essential for leveraging the GPU compute capabilities for frameworks like PyTorch, TensorFlow, and ONNX Runtime.
After installing ROCm on systems with AMD Strix Halo integrated graphics, users may encounter a situation where:
rocminforeturns no agents or fails entirely/dev/kfd(Kernel Fusion Driver) device node is missing/dev/dri/renderD*nodes are absent- GPU compute workloads fail to initialize
Root Cause: The amdgpu kernel module gets blacklisted, preventing the GPU driver from loading at boot.
Linux kernel modules are pieces of code that can be loaded into the kernel on demand to extend functionality (drivers, filesystems, etc.). Blacklisting prevents a module from loading automatically, even if the hardware it supports is present.
Blacklist configurations are stored in /etc/modprobe.d/ as .conf files with entries like:
blacklist amdgpuBlacklisting is a legitimate system administration technique used to enforce a specific hardware configuration. NVIDIA and CUDA installers intentionally blacklist competing GPU drivers (including amdgpu, radeon, and nouveau) to create a predictable, single-vendor GPU environment. This prevents driver conflicts, ensures CUDA has exclusive GPU access, and simplifies debugging - but it becomes problematic on systems where you actually want to use the AMD GPU.
A parallel example is the nouveau driver (open-source NVIDIA driver): NVIDIA's proprietary installer blacklists nouveau to prevent it from competing with their closed-source driver. This is standard practice and works well for dedicated NVIDIA systems, but causes issues on hybrid setups or when switching GPU vendors.
Several scenarios can cause the amdgpu module to be blacklisted:
| Cause | Description |
|---|---|
| NVIDIA Driver Installation | Proprietary NVIDIA drivers (via apt, runfiles, or CUDA installers) often blacklist competing GPU drivers to prevent conflicts |
| Legacy Driver Conflicts | Systems with both integrated AMD graphics and discrete GPUs may have conflicting driver requirements |
| Ubuntu Pro/Livepatch | Some enterprise configurations blacklist modules for stability |
| Manual Intervention | Previous troubleshooting attempts may have added blacklist entries |
| Installer Bugs | Some ROCm or driver installer versions incorrectly create blacklist files |
| initramfs Persistence | Even after removing blacklist files, old configurations persist in the initial RAM filesystem |
/etc/modprobe.d/ # Primary configuration directory
├── blacklist.conf # General blacklist (check for amdgpu entries)
├── blacklist-amdgpu.conf # Dedicated amdgpu blacklist (if exists)
├── nvidia-installer-*.conf # NVIDIA installer generated
├── nvidia-graphics-drivers.conf # Ubuntu NVIDIA package
└── *.conf # Any file can contain blacklist directives
lsmod | grep amdgpuExpected output (working system):
amdgpu 12345678 0
drm_ttm_helper 1234 1 amdgpu
ttm 56789 1 amdgpu
drm_exec 1234 1 amdgpu
gpu_sched 12345 1 amdgpu
drm_buddy 1234 1 amdgpu
drm_display_helper 12345 1 amdgpu
i2c_algo_bit 1234 1 amdgpu
If empty: The module is not loaded.
# Search all modprobe config files for amdgpu references
grep -r "amdgpu" /etc/modprobe.d/
# Check specifically for blacklist directives
grep -r "blacklist.*amdgpu" /etc/modprobe.d/Problem indicator:
/etc/modprobe.d/blacklist-amdgpu.conf:blacklist amdgpu
# ROCm compute device (Kernel Fusion Driver)
ls -la /dev/kfd
# GPU render nodes
ls -la /dev/dri/render*Missing nodes indicate the driver is not loaded.
# View amdgpu-related kernel messages
sudo dmesg | grep -i amdgpu
# Check for module loading errors
sudo dmesg | grep -i "module.*blacklist\|amdgpu.*error"# Check what would happen if we tried to load amdgpu
modprobe --dry-run --verbose amdgpuThe fix_rocm_boot.sh script performs these operations:
rm /etc/modprobe.d/blacklist-amdgpu.confDeletes the blacklist configuration file that prevents amdgpu from loading.
Creates /etc/modprobe.d/amdgpu.conf with optimal settings for Strix Halo:
| Option | Value | Purpose |
|---|---|---|
dc=1 |
Enable | Display Core - modern display engine for HDMI/DP |
dpm=1 |
Enable | Dynamic Power Management - power states and thermal control |
si_support=0 |
Disable | Southern Islands (GCN 1.0) - not needed for RDNA 3.5 |
cik_support=0 |
Disable | Sea Islands (GCN 2.0) - not needed for RDNA 3.5 |
Disabling legacy GPU support reduces memory footprint and prevents potential conflicts.
Creates /etc/modules-load.d/amdgpu.conf:
amdgpu
This ensures the module loads early in the boot process, before display managers or user services start.
update-initramfs -uCritical step: The initial RAM filesystem (initramfs) is a temporary root filesystem loaded at boot. It contains:
- Essential kernel modules
- Module configuration (including blacklists)
- Early boot scripts
Without updating initramfs, the old blacklist configuration remains embedded and continues to prevent module loading, even after the source file is deleted.
modprobe amdgpuLoads the driver without requiring a reboot for immediate testing.
Validates the fix by checking:
/dev/kfdexistence (ROCm compute support)/dev/dri/render*nodes (GPU access)- Module loaded in
lsmod rocminfooutput (ROCm stack verification)
- Ubuntu 22.04/24.04 or compatible distribution
- ROCm installed (
rocminfoin PATH) - Root/sudo access
# Make executable
chmod +x fix_rocm_boot.sh
# Run with root privileges
sudo ./fix_rocm_boot.sh=== ROCm Boot Fix Script ===
[INFO] Step 1: Removing amdgpu blacklist...
[INFO] Removed /etc/modprobe.d/blacklist-amdgpu.conf
[INFO] Step 2: Creating amdgpu driver options...
[INFO] Created /etc/modprobe.d/amdgpu.conf
[INFO] Step 3: Configuring amdgpu to load at boot...
[INFO] Created /etc/modules-load.d/amdgpu.conf
[INFO] Step 4: Updating initramfs (this may take a moment)...
[INFO] Initramfs updated
[INFO] Step 5: Loading amdgpu module now...
[INFO] amdgpu module loaded successfully
[INFO] Step 6: Verifying GPU devices...
[INFO] /dev/kfd exists - ROCm compute support available
crw-rw---- 1 root render 234, 0 Dec 10 12:00 /dev/kfd
[INFO] Render nodes found:
crw-rw----+ 1 root render 226, 128 Dec 10 12:00 /dev/dri/renderD128
[INFO] Loaded amdgpu modules:
amdgpu 15728640 0
[INFO] Step 7: Testing ROCm...
ROCm Runtime Version: 6.x.x
...
=== Summary ===
[INFO] Configuration changes applied:
- Removed: /etc/modprobe.d/blacklist-amdgpu.conf
- Created: /etc/modprobe.d/amdgpu.conf
- Created: /etc/modules-load.d/amdgpu.conf
- Updated: initramfs
[INFO] ROCm should now work! Test with: rocminfo
[INFO] Done!
# List ROCm agents (should show your GPU)
rocminfo
# Check OpenCL devices
clinfo
# For PyTorch users
python3 -c "import torch; print(torch.cuda.is_available())" # Uses HIP backendIf you are using Ubuntu with Desktop UI another good sign is that you are now able to change the display resolutions. Without the drivers in place this would not be possible.
After reboot:
# Confirm module loads at boot
lsmod | grep amdgpu
# Confirm devices exist
ls /dev/kfd /dev/dri/render*
# Confirm no blacklist remains
grep -r "blacklist.*amdgpu" /etc/modprobe.d/-
Check kernel support:
grep CONFIG_HSA_AMD /boot/config-$(uname -r) # Should show: CONFIG_HSA_AMD=y or =m
-
Verify user permissions:
# Add user to render and video groups sudo usermod -aG render,video $USER # Log out and back in
-
Check for conflicting drivers:
lsmod | grep -E "radeon|nvidia"
-
Check HSA status:
cat /sys/class/kfd/kfd/topology/nodes/*/properties -
Verify ROCm installation:
apt list --installed | grep rocm dpkg -l | grep amdgpu-dkms
The script enables Display Core (dc=1). If experiencing display problems:
# Temporarily disable DC for debugging
sudo modprobe -r amdgpu
sudo modprobe amdgpu dc=0| File | Action | Purpose |
|---|---|---|
/etc/modprobe.d/blacklist-amdgpu.conf |
Removed | Eliminate blacklist preventing driver load |
/etc/modprobe.d/amdgpu.conf |
Created | Set driver options for Strix Halo |
/etc/modules-load.d/amdgpu.conf |
Created | Ensure module loads at boot |
/boot/initrd.img-* |
Updated | Embed new configuration in boot image |
┌─────────────────────────────────────────────────────────────┐
│ User Space │
├─────────────────────────────────────────────────────────────┤
│ ROCm Runtime │ OpenCL │ HIP │ PyTorch/TensorFlow │
├─────────────────────────────────────────────────────────────┤
│ libdrm / libhsakmt │
├─────────────────────────────────────────────────────────────┤
│ Kernel Space │
├─────────────────────────────────────────────────────────────┤
│ amdgpu.ko (DRM driver) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ DC │ │ DPM │ │ KFD │ │ GPU Sched │ │
│ │ Display │ │ Power │ │ Compute │ │ Workload │ │
│ │ Core │ │ Mgmt │ │ Driver │ │ Manager │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Hardware │
│ AMD Strix Halo (RDNA 3.5 iGPU) │
└─────────────────────────────────────────────────────────────┘
- amdgpu.ko: Unified kernel driver for all modern AMD GPUs (GCN, RDNA)
- KFD (Kernel Fusion Driver): HSA-compatible compute interface, exposes
/dev/kfd - DC (Display Core): Modern display engine for HDMI 2.1, DP, eDP
- DPM (Dynamic Power Management): Power states, clocking, thermal management
The AMD Strix Halo (Ryzen AI 300 series) features:
- RDNA 3.5 integrated graphics (Radeon 8060S)
- 40 Compute Units
- ROCm support via
gfx1151target - Requires
amdgpudriver (not legacyradeon)
- ROCm Documentation
- AMDGPU Kernel Driver Documentation
- Linux Kernel Module Blacklisting
- Ubuntu initramfs Documentation
MIT License - See LICENSE for details.
- Fork the repository
- Create a feature branch
- Submit a pull request
Juergen Fey - SmartTechlabs.de. 12-2025. Created to resolve ROCm initialization issues on AMD Strix Halo systems.