This file provides guidance to GitHub Copilot when working with code in this repository.
SlangPy is a native Python extension that provides a high-level interface for working with low-level graphics APIs (Vulkan, Direct3D 12, CUDA). The native side wraps the slang-rhi project (external/slang-rhi) using nanobind bindings. The project also contains a "functional API" that allows users to call Slang functions on the GPU with Python function call syntax.
| Directory | Description |
|---|---|
src/sgl/ |
Native C++ code (core GPU abstraction layer) |
src/slangpy_ext/ |
Python bindings (nanobind) |
src/slangpy_torch/ |
Native torch integration extension |
slangpy/ |
Python package implementation |
slangpy/tests/ |
Python tests (pytest) |
tests/ |
C++ tests (doctest) |
tools/ |
General utility scripts |
.github/workflows/ |
CI workflows |
examples/, samples/examples/, samples/experiments/ |
Example code |
docs/ |
Documentation |
external/ |
External C++ dependencies |
- If I tell you that you are wrong, think about whether or not you think that's true and respond with facts.
- Avoid apologizing or making conciliatory statements.
- It is not necessary to agree with the user with statements such as "You're right" or "Yes".
- Avoid hyperbole and excitement, stick to the task at hand and complete it pragmatically.
- New Python APIs must have tests in
slangpy/tests/ - Always build before running tests
- Run pre-commit after completing tasks (
pre-commit run --all-files; re-run if it modifies files) - Use type annotations for all Python function arguments
- Minimize new dependencies — the project has minimal external deps
The project has three main layers:
- Python Layer (
slangpy/) — High-level API with Module, Function, Device classes - C++ Binding Layer (
src/slangpy_ext/) — Nanobind-based Python-C++ interface - Core SGL Layer (
src/sgl/) — Low-level GPU device management and shader compilation
C++ types typically map to slang-rhi counterparts (e.g., Device wraps rhi::IDevice).
- Module: Container for Slang shader code, loaded from
.slangfiles - Function: Callable GPU function with automatic Python↔GPU marshalling
- Device: GPU context managing resources and compute dispatch
- CallData: Cached execution plans for optimized repeated calls
- Buffer/Texture: GPU memory resources with Python array interface
- Python-layer errors raise standard Python exceptions (typically
ValueError,TypeError, orSlangPyError). - C++ errors are translated to Python exceptions via nanobind. Shader compile errors surface as exceptions containing the Slang compiler diagnostic text.
- GPU errors (device lost, out of memory) propagate as exceptions from the RHI layer.
- When debugging, set
SLANGPY_PRINT_GENERATED_SHADERS=1to see the generated kernel code that gets compiled.
Replace <PLATFORM> with windows-msvc, linux-gcc, or macos-arm64-clang as appropriate:
cmake --preset <PLATFORM> # Configure
cmake --build --preset <PLATFORM>-debug # Build (debug)
cmake --preset <PLATFORM> --fresh # Reconfigure from scratchAvailable presets: windows-msvc, windows-arm64-msvc, linux-gcc, macos-arm64-clang.
Always build before running tests.
pytest slangpy/tests -v # All Python tests
pytest samples/tests -vra # Example tests
python tools/ci.py unit-test-cpp # C++ unit tests
pytest slangpy/tests/slangpy_tests/test_X.py -v # Specific file
pytest slangpy/tests/slangpy_tests/test_X.py::test_fn -v # Specific functionDebug generated shaders (PowerShell):
$env:SLANGPY_PRINT_GENERATED_SHADERS="1"; pytest slangpy/tests/slangpy_tests/test_X.py -v- Classes: PascalCase | Functions/variables: snake_case | Members:
m_prefix
- Classes: PascalCase | Functions/variables: snake_case | Public Members: no prefix | Private Members:
_prefix - All arguments must have type annotations
/// Description.
void do_something();
/// Pack two float values to 8-bit snorm.
/// @param v Float values in [-1,1].
/// @param options Packing options.
/// @return 8-bit snorm values in low bits, high bits all zero.
uint32_t pack_snorm2x8(float2 v, const PackOptions options = PackOptions::safe);def myfunc(x: int, y: int) -> int:
"""
Description.
:param x: Some parameter.
:param y: Some parameter.
:return: Some return value.
"""Slang is a shader language based on HLSL. Key patterns used in this project:
[shader("compute")]attribute marks GPU entry pointsStructuredBuffer<T>/RWStructuredBuffer<T>for typed GPU arraysuint3 tid : SV_DispatchThreadIDfor thread indexing- Generics via
<T>, interfaces viainterface IFoo, conformance viastruct Foo : IFoo - Differentiable functions:
[Differentiable] float foo(float x)withbwd_diff(foo)for backprop - See
.slangfiles inslangpy/tests/for project-specific patterns
The CI uses GitHub Actions (.github/workflows/ci.yml) and calls tools/ci.py:
python tools/ci.py configure # CMake configure
python tools/ci.py --help # All available commands- Python runtime:
requirements.txt| Dev/tests:requirements-dev.txt - C++ dependencies:
external/ - Testing: pytest (Python), doctest (C++)
- Shading language: Slang
- Formatting:
pre-commithooks (Black for Python, clang-format for C++)
Many issues in SlangPy originate from the Slang compiler itself (shader-slang/slang repo).
# 1. Query for path to built local slang (eg c:/sw/slang)
# 2. Reconfigure fresh with local slang
cmake --preset windows-msvc --fresh -DSGL_LOCAL_SLANG=ON -DSGL_LOCAL_SLANG_DIR=<slang dir> -DSGL_LOCAL_SLANG_BUILD_DIR=build/Debug
# 3. build as normal
cmake --build --preset windows-msvc-debug| CMake Option | Default | Description |
|---|---|---|
SGL_LOCAL_SLANG |
OFF | Enable to use a local Slang build |
SGL_LOCAL_SLANG_DIR |
../slang |
Path to the local Slang repository |
SGL_LOCAL_SLANG_BUILD_DIR |
build/Debug |
Build directory within the Slang repo |
- Reproduce the issue in SlangPy
- Clone and build Slang locally (above)
- Read
external/slang/CLAUDE.md - Edit Slang source → rebuild Slang → rebuild SlangPy → test
- Use
python tools/ci.pyfor most build/test tasks — handles platform-specific config - PyTorch integration is automatic when PyTorch is installed
- Hot-reload is supported for shader development
The functional API allows calling Slang GPU functions from Python with automatic type marshalling, vectorization, and kernel generation.
// myshader.slang
float add(float a, float b) { return a + b; }import slangpy as spy
import numpy as np
device = spy.Device()
module = spy.Module.load_from_file(device, "myshader.slang")
a = spy.Tensor.from_numpy(device, np.array([1, 2, 3], dtype=np.float32))
b = spy.Tensor.from_numpy(device, np.array([4, 5, 6], dtype=np.float32))
result = module.add(a, b) # Returns Tensor([5, 7, 9])Python call → Phase 1: Signature Lookup (C++) → Cache hit? → Phase 3: Dispatch (C++)
↓ Cache miss
Phase 2: Kernel Generation (Python)
| File | Purpose |
|---|---|
slangpy/core/function.py |
FunctionNode — Python entry point for function calls |
slangpy/core/calldata.py |
CallData — kernel generation and caching |
slangpy/core/callsignature.py |
Type resolution and binding helpers |
slangpy/bindings/boundvariable.py |
BoundCall/BoundVariable — tracks Python↔Slang bindings |
slangpy/bindings/marshall.py |
Marshall base class for type marshalling |
slangpy/bindings/typeregistry.py |
Maps Python types to their Marshall implementations |
src/slangpy_ext/utils/slangpyfunction.cpp |
NativeFunctionNode::call — native call entry |
src/slangpy_ext/utils/slangpy.cpp |
NativeCallData::exec — native dispatch |
| Class | Layer | Purpose |
|---|---|---|
FunctionNode |
Python | Represents a callable Slang function with modifiers |
CallData |
Python | Generated kernel data (bindings, compiled shader) |
BoundCall |
Python | Collection of BoundVariable for a single call |
BoundVariable |
Python | Pairs Python value with Slang parameter |
NativeMarshall |
Both | Type-specific marshalling (shape, data binding) |
NativeCallData |
C++ | Native call data with cached dispatch info |
NativeCallDataCache |
C++ | Signature → CallData cache |
Location: src/slangpy_ext/utils/slangpyfunction.cpp
Runs on every call — must be fast. Builds a unique signature string from the function node chain and argument types/properties, looks it up in NativeCallDataCache. Cache hit skips to Phase 3; cache miss triggers Phase 2.
Location: slangpy/core/calldata.py → CallData.__init__()
Runs once per unique call signature. The pipeline:
- Unpack arguments — recursively resolve
IThiswrappers viaget_this() - Build BoundCall — create a
BoundVariableper argument, each assigned aNativeMarshallbased on Python type (int/float→ScalarMarshall,Tensor→TensorMarshall,dict→ recursive children) - Apply explicit vectorization — user-specified
function.map()dimension/type mappings - Type resolution (
slangpy/reflection/typeresolution.py) — each marshall'sresolve_types()determines compatible Slang types. Resolves overloaded functions by best match. - Bind parameters — pair each
BoundVariablewith its resolved Slang parameter - Apply implicit vectorization — calculate per-argument dimensionality
- Calculate call dimensionality — max across all arguments
- Create return value binding — auto-creates
ValueRef(dim 0) orTensor(dim > 0) for_result - Finalize mappings — resolve Python→kernel dimension mappings (default: right-aligned)
- Calculate differentiability — determine gradient support per argument
- Generate code — produce Slang compute kernel source
- Compile shader — compile via Slang; cache the
CallData
| Python Value | Slang Parameter | Resolved Binding |
|---|---|---|
Tensor[float, 2D] |
float |
float (elementwise) |
Tensor[float, 2D] |
Tensor<float,2> |
Tensor<float,2> (whole) |
Tensor[float, 2D] |
float2 |
float2 (row as vector) |
Tensor[float, 2D] |
vector<T,2> |
vector<float,2> (generic) |
| Python Value | Slang Parameter | Dimensionality |
|---|---|---|
Tensor[float, 2D shape=(H,W)] |
float |
2 (one thread per element) |
float |
float |
0 (single thread) |
Tensor[float, 2D] |
Tensor<float,2> |
0 (whole tensor per thread) |
Tensor[float, 2D shape=(H,W)] |
float2 |
1 (one thread per row) |
Location: src/slangpy_ext/utils/slangpy.cpp
Runs on every call, entirely in C++:
- Unpack arguments (native
unpack_args/unpack_kwargs) - Calculate call shape — each marshall's
get_shape()returns concrete dimensions; call shape determines thread count - Allocate return value — create output Tensor/ValueRef if
_resultnot provided - Bind uniforms & dispatch — marshalls write data to GPU via
write_shader_cursor_pre_dispatch(), thendispatch() - Read results — post-dispatch readback via
read_calldata()andread_output()
To support a new Python type in the functional API:
- Create a Marshall in
slangpy/bindings/orslangpy/builtin/— subclassMarshalland implementresolve_types(),resolve_dimensionality(),gen_calldata(). See existing marshalls (e.g.,TensorMarshall) for the pattern. - Register in
slangpy/bindings/typeregistry.py— add entry toPYTHON_TYPESdict. - (Optional) Native signature — for performance, add a type signature handler in
NativeCallDataCacheconstructor (src/slangpy_ext/utils/slangpyfunction.cpp).