How to Resolve Compatibility and Runtime Issues During Integration of XSched with XpuOS/llama.cpp？

Dear XSched Development Team,

During testing of the integration between your team's modified **XpuOS/llama.cpp** (adapted for XSched) and XSched, I successfully compiled the code on an **NVIDIA A100 GPU (Ampere architecture, compute capability 8.0)**. However, multiple runtime exceptions occur during execution, blocking further integration testing. I am reporting the detailed issues, abnormal behaviors, and relevant context below and sincerely request your support and guidance.

## 1. Integration Environment
1. **Hardware**: NVIDIA A100 GPU (only supports **Level 1 preemption**, corresponding to the enum `kPreemptLevelBlock`).
2. **Software**: Your team's modified XpuOS/llama.cpp, XSched scheduler.
3. **Current Status**: Code compiled successfully; XSched loads normally; basic XQueue creation and priority management work. However, the process **crashes immediately** when executing inference tasks.

## 2. Specific Issues & Abnormal Behaviors

### 2.1 Hardcoded Preemption Level Incompatible with A100
Note: `kPreemptLevelDeactivate` is **not a function** but an enum constant for **Level 2 preemption**, used to specify the queue preemption level.

In your modified XpuOS/llama.cpp, the preemption level is **hardcoded to Level 2 (`kPreemptLevelDeactivate`)**.
However, the **NVIDIA A100 only supports Level 1 preemption (`kPreemptLevelBlock`)** and does NOT support Level 2.

This mismatch causes:
- `cuda error 907: operation not permitted`
- Followed by **segmentation fault (core dumped)**
- Process crashes instantly when running inference.

### 2.2 Missing Automatic Hardware Adaptation; Environment Variable Not Effective
XpuOS/llama.cpp currently **lacks automatic detection/adaptation logic** for preemption levels.
It only uses the hardcoded Level 2 and cannot automatically fall back to Level 1 based on hardware capabilities.

I attempted to manually set the environment variable:
```
XSCHED_AUTO_XQUEUE_LEVEL=1
```
But this variable has **no effect** on explicit queue creation in XpuOS/llama.cpp — Level 2 is still enforced. Switching levels requires manual source code modification, severely reducing integration efficiency.

### 2.3 Conflict Between CUDA Graph and XSched Event Recording
XpuOS/llama.cpp enables **CUDA Graph** for batch processing to improve inference performance.
However, XSched inserts scheduling events during GPU task execution, leading to a conflict.

**Symptom**:
CUDA Graph runs in capture mode; XSched attempts to record scheduling events during this period, which exceeds hardware support. This exacerbates runtime instability and causes process crashes, preventing successful inference completion.

## 3. Questions & Requests
Regarding the above integration issues, I would appreciate clarification and solutions:

1. Could you provide a patch or modify the source code to make the hardcoded Level 2 preemption in XpuOS/llama.cpp configurable, or add automatic hardware preemption level detection logic to achieve compatibility with GPUs that only support Level 1, such as the A100?
2. How to make `XSCHED_AUTO_XQUEUE_LEVEL=1` **actually take effect** in XpuOS/llama.cpp queue creation, without manual code changes?
3. For the CUDA Graph vs. XSched event conflict, is there a feasible solution (e.g., disable specific features, adjust event recording logic) to ensure stable inference execution?

Thank you for your excellent work. I look forward to your reply and support to complete the test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Resolve Compatibility and Runtime Issues During Integration of XSched with XpuOS/llama.cpp？ #23

1. Integration Environment

2. Specific Issues & Abnormal Behaviors

2.1 Hardcoded Preemption Level Incompatible with A100

2.2 Missing Automatic Hardware Adaptation; Environment Variable Not Effective

2.3 Conflict Between CUDA Graph and XSched Event Recording

3. Questions & Requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to Resolve Compatibility and Runtime Issues During Integration of XSched with XpuOS/llama.cpp？ #23

Description

1. Integration Environment

2. Specific Issues & Abnormal Behaviors

2.1 Hardcoded Preemption Level Incompatible with A100

2.2 Missing Automatic Hardware Adaptation; Environment Variable Not Effective

2.3 Conflict Between CUDA Graph and XSched Event Recording

3. Questions & Requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions