OpenTela

OpenTela (Aka: OpenFabric) is a distributed computing platform designed to orchestrate computing resources across a decentralized network. It leverages peer-to-peer networking, CRDT-based state management to create a resilient and scalable network of computing resources. It is used to power the serving system at SwissAI Initiative.

Tela is the latin word for "Fabric", which refers to the interconnected network of computing resources that OpenTela manages.

Latest Updates

[2026/02] 💡 How SwissAI Leverages OpenTela: We wrote a case study on how SwissAI uses OpenTela to orchestrate their distributed GPU nodes for scalable model serving. Read more.

Features

Decentralized Orchestration: OpenTela eliminates the need for a central coordinator by using a gossip-based P2P network. It utilizes a Conflict-free Replicated Data Type (CRDT) registry to manage service discovery, health monitoring, and routing across distributed nodes. This architecture allows the system to remain operational and maintain a global view of resources even during network partitions.
Non-Invasive HPC Integration: Designed specifically for the constraints of supercomputing environments, the system operates entirely as a user-space overlay. It bridges the gap between batch schedulers (like Slurm) and interactive serving engines (like vLLM or SGLang) without requiring root privileges or kernel modifications. This allows researchers to spin up "cloud-like" serving clusters using standard permissions.
Robust Fault Tolerance and Elasticity: OpenTela is built for high-churn environments where resources are often volatile or preemptible (e.g., scavenger queues, preemptible cloud instances or slurm preemption). It utilizes peer-to-peer heartbeats to detect node failures within seconds, automatically marking failed nodes as "LEFT" and rerouting traffic to healthy replicas without service interruption.

Adoption

OpenTela is used to power SwissAI Serving. It acts as the decentralized orchestration layer, routing inference requests to distributed GPU nodes while managing state, metrics, and peer discovery to ensure resilient and scalable model serving.

Documentation

Contributing

Contributions are welcome! Please follow the code of conduct and submit pull requests for any enhancements or bug fixes.

License

This project is licensed under the Apache v2 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
docker/dispatcher		docker/dispatcher
docs		docs
local-demo		local-demo
meta		meta
src		src
tokens		tokens
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenTela

Latest Updates

Features

Adoption

Documentation

Contributing

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

eth-easl/OpenTela

Folders and files

Latest commit

History

Repository files navigation

OpenTela

Latest Updates

Features

Adoption

Documentation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages