Skip to content

abhishekshivanna/xray

Repository files navigation

XRay

A platform for end-to-end ML model development.

XRay gives ML engineers self-service development environments on Kubernetes. Create a workspace, SSH in, write code, train models -- without waiting for ops to provision machines.

This project is a work in progress. Some features described below are planned but not yet implemented.

What it does

  • Workspaces -- Spin up GPU-enabled dev environments with one command. SSH in from your terminal or IDE (VS Code, Cursor). Stop and start them as needed.
  • Multi-tenant -- Organizations, teams, and projects. Each user gets their own workspaces scoped to a project.
  • Multi-cluster -- Connect multiple Kubernetes clusters. Workspaces are created on the cluster you choose.
  • Job submission -- Submit training jobs to shared clusters. (planned)
  • Queuing and priority -- Fair-share scheduling across teams with priority levels. (planned)

How it works

There are three main components:

  1. Control Plane -- A Go server with a REST API and web UI. Manages users, teams, clusters, and workspaces. Stores state in PostgreSQL.
  2. Cluster Agent -- A lightweight binary that runs in each Kubernetes cluster. Connects to the control plane via WebSocket, creates/deletes workspace pods on command.
  3. CLI -- A command-line tool (xray) for logging in, managing workspaces, and configuring SSH access.

Documentation

For detailed architecture documentation, see .memory/architecture/:

  • Architecture Overview - Tech stack, project structure, and architecture layers
  • Authentication - OAuth flows (web + CLI PKCE), token refresh, user-identity pattern
  • Teams & Organizations - 3-level hierarchy, memberships, authorization model
  • Clusters & Agent - Cluster registration, agent WebSocket protocol, bastion, pod management, storage
  • Compute Templates - Admin-defined resource presets, scheduling hints, admin/user views
  • Workspaces - Workspace lifecycle, SSH key management, CLI SSH config, connect-info
  • Jobs - Batch compute, multi-node execution, submission patterns
  • Database Schema - ER diagram, migration history, query patterns
  • Frontend - React SPA structure, auth state machine, API communication
  • E2E Testing - Kind cluster setup, Docker images, test infrastructure

Work in progress. Contributions and feedback welcome.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Contributors