Skip to content

Commit 955883c

Browse files
Support multiple backends (#596)
* initial implementation * Use entry points instead of jupyter server setting traitlets for backend definition and discovery * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust details page field display order * add advancedOptionsOverride token to avoid token mismatch in extensions * Eencode backend into job IDs, add backend field to job definitions, use database_manager_class to detect SQL storage needs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename legacy backend, cleanup tests and comments * add python backend * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add dynamic context menu registration * Only validate notebooks have a kernel for .ipynb files * add stdour and stderr for py files * Auto-select backend by file extension, fall back to default * Create stdout/stderr files only when there's actual content * Only add output files that actually exist to job_files / output files * hide backend picker only while loading * rename default backends * change AdvancedOptions imports * demo: output file formats * demo: hyperpod backend * support listing from multiple backends, add hardcoded putput formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * optimize backend_id logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix output format picker * update tests * update output_formats.name to id * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tests * update UpdateJob model * comment out # jupyter_server_nb, jupyter_server_py, sagemaker_hyperpod backends * nake list jobs async * cleanup code * fix backend picker * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix backend picker * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * simplify default backend choice logic * Remove allowed_backends / blocked_backends * remove unused code SageMakerHyperPodBackend, OutputFormatDescriptor * type BaseBackend.OutputFormats * streamline custromization of default backend per file extension via preferred_backends traitlet, of which backend to be used for legacy jobs via legacy_job_backend traitlet * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * raise error on an unknown backend instaed of silently falling back on legacy / default backend * abstract scheduler resolution into helper function * add playwright test for multiple backends * simplify docstrings * add autoflake to CI and run it to remove unused imports, statements, vars * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * run lint * remove duplicate test * remove unnecessary tests and comments * reorder backends.py structure * remove advanced options override * update snapshots * refactor helper functions * add comments * change id formulation * rename backend to backend_id, remove `local` fallback logic, fill in backend_id for legacy jobs at the backend, update snapshots * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out flakey snapshot comparasions * implement comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * return error strings with 500 * properly map ValidationError to 400 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update logger.error to logger.exception to catch trace * check for absence of colon in the backend_id * add typing * make auto-seleciong of a valid backend when preferred backend doesn't support it logic more readable * reference strerr rather than embed part of the output * fix test assertions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 9d5e2ea commit 955883c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+3047
-394
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ repos:
1515
- id: check-builtin-literals
1616
- id: trailing-whitespace
1717

18+
- repo: https://github.com/PyCQA/autoflake
19+
rev: v2.3.1
20+
hooks:
21+
- id: autoflake
22+
1823
- repo: https://github.com/psf/black
1924
rev: 24.2.0
2025
hooks:

conftest.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,33 @@
11
from pathlib import Path
2+
from unittest.mock import patch
23

34
import pytest
45
from sqlalchemy import create_engine
56
from sqlalchemy.orm import sessionmaker
67

78
from jupyter_scheduler.orm import Base
89
from jupyter_scheduler.scheduler import Scheduler
9-
from jupyter_scheduler.tests.mocks import MockEnvironmentManager
10+
from jupyter_scheduler.tests.mocks import MockEnvironmentManager, MockTestBackend
1011

1112
pytest_plugins = ("jupyter_server.pytest_plugin", "pytest_jupyter.jupyter_server")
1213

1314

15+
def _mock_discover_backends(*args, **kwargs):
16+
"""Return test backends for testing."""
17+
from jupyter_scheduler.backends import JupyterServerNotebookBackend
18+
19+
return {"jupyter_server_nb": JupyterServerNotebookBackend, "test": MockTestBackend}
20+
21+
22+
@pytest.fixture(autouse=True)
23+
def mock_backend_discovery():
24+
"""Patch backend discovery to include test backend for all tests."""
25+
with patch(
26+
"jupyter_scheduler.extension.discover_backends", side_effect=_mock_discover_backends
27+
):
28+
yield
29+
30+
1431
@pytest.fixture(scope="session")
1532
def static_test_files_dir() -> Path:
1633
return Path(__file__).parent.resolve() / "jupyter_scheduler" / "tests" / "static"
@@ -51,6 +68,7 @@ def jp_scheduler_db(jp_scheduler_db_url):
5168
session = Session()
5269
yield session
5370
session.close()
71+
engine.dispose()
5472

5573

5674
@pytest.fixture

dev/seed.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ async def load_data(jobs_count: int, job_defs_count: int, db_path: str):
155155
f"\nCreated {jobs_count} jobs and {job_defs_count} job definitions in the scheduler database"
156156
)
157157
click.echo(f"present at {db_path}. Copy the following command")
158-
click.echo(f"to start JupyterLab with this database.\n")
158+
click.echo("to start JupyterLab with this database.\n")
159159
click.echo(f"`jupyter lab --SchedulerApp.db_url={db_url}`\n")
160160

161161

jupyter_scheduler/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
"""Scheduling API for JupyterLab"""
22

3-
from ._version import __version__
43
from .extension import SchedulerApp
54

65

jupyter_scheduler/_version.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
import json
2-
from pathlib import Path
3-
41
__all__ = ["__version__"]
52

63
version_info = (2, 11, 0, "", "")
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
import logging
2+
from typing import Any, Dict, List, Optional, Type
3+
4+
from jupyter_scheduler.backends import BackendConfig, DescribeBackendResponse
5+
from jupyter_scheduler.environments import EnvironmentManager
6+
from jupyter_scheduler.orm import create_tables
7+
from jupyter_scheduler.pydantic_v1 import BaseModel
8+
9+
logger = logging.getLogger(__name__)
10+
11+
12+
def import_class(class_path: str) -> Type:
13+
"""Import a class from a fully qualified path like 'module.submodule.ClassName'."""
14+
module_path, class_name = class_path.rsplit(".", 1)
15+
module = __import__(module_path, fromlist=[class_name])
16+
return getattr(module, class_name)
17+
18+
19+
class BackendInstance(BaseModel):
20+
"""A running backend with its configuration and initialized scheduler."""
21+
22+
config: BackendConfig
23+
scheduler: Any # BaseScheduler at runtime, but Any to support test mocks
24+
25+
26+
class BackendRegistry:
27+
"""Registry for storing, initializing, and routing to scheduler backends."""
28+
29+
def __init__(
30+
self,
31+
configs: List[BackendConfig],
32+
legacy_job_backend: str,
33+
preferred_backends: Optional[Dict[str, str]] = None,
34+
):
35+
self._configs = configs
36+
self._backends: Dict[str, BackendInstance] = {}
37+
self._legacy_job_backend = legacy_job_backend
38+
self._preferred_backends = preferred_backends or {}
39+
self._extension_map: Dict[str, List[str]] = {}
40+
41+
def initialize(
42+
self,
43+
root_dir: str,
44+
environments_manager: EnvironmentManager,
45+
db_url: str,
46+
config: Optional[Any] = None,
47+
):
48+
"""Instantiate all backends from configs."""
49+
seen_ids = set()
50+
for cfg in self._configs:
51+
if cfg.id in seen_ids:
52+
raise ValueError(f"Duplicate backend ID: '{cfg.id}'")
53+
if ":" in cfg.id:
54+
raise ValueError(f"Backend ID cannot contain ':': '{cfg.id}'")
55+
seen_ids.add(cfg.id)
56+
57+
for cfg in self._configs:
58+
try:
59+
instance = self._create_backend(cfg, root_dir, environments_manager, db_url, config)
60+
self._backends[cfg.id] = instance
61+
62+
for ext in cfg.file_extensions:
63+
ext_lower = ext.lower().lstrip(".")
64+
if ext_lower not in self._extension_map:
65+
self._extension_map[ext_lower] = []
66+
self._extension_map[ext_lower].append(cfg.id)
67+
68+
logger.info(f"Initialized backend: {cfg.id} ({cfg.name})")
69+
except Exception as e:
70+
logger.error(f"Failed to initialize backend {cfg.id}: {e}")
71+
raise
72+
73+
def _create_backend(
74+
self,
75+
cfg: BackendConfig,
76+
root_dir: str,
77+
environments_manager: EnvironmentManager,
78+
global_db_url: str,
79+
config: Optional[Any] = None,
80+
) -> BackendInstance:
81+
"""Import scheduler class, instantiate it, and return a BackendInstance.
82+
83+
Creates database tables if not found and backend uses default SQLAlchemy storage.
84+
"""
85+
scheduler_class = import_class(cfg.scheduler_class)
86+
87+
backend_db_url = cfg.db_url or global_db_url
88+
89+
# Create SQL tables only if backend uses default SQLAlchemy storage.
90+
# Backends with custom database_manager_class handle their own storage.
91+
if backend_db_url and cfg.database_manager_class is None:
92+
create_tables(backend_db_url)
93+
94+
scheduler = scheduler_class(
95+
root_dir=root_dir,
96+
environments_manager=environments_manager,
97+
db_url=backend_db_url,
98+
config=config,
99+
backend_id=cfg.id,
100+
)
101+
102+
if cfg.execution_manager_class:
103+
scheduler.execution_manager_class = import_class(cfg.execution_manager_class)
104+
105+
return BackendInstance(config=cfg, scheduler=scheduler)
106+
107+
def get_backend(self, backend_id: str) -> Optional[BackendInstance]:
108+
"""Return a backend with matching ID, None if none is found."""
109+
return self._backends.get(backend_id)
110+
111+
def get_legacy_job_backend(self) -> BackendInstance:
112+
"""Get the backend for routing legacy jobs (UUID-only IDs from pre-3.0).
113+
114+
Raises:
115+
KeyError: If the configured legacy_job_backend ID is not found.
116+
"""
117+
if self._legacy_job_backend not in self._backends:
118+
raise KeyError(f"Legacy job backend '{self._legacy_job_backend}' not found in registry")
119+
return self._backends[self._legacy_job_backend]
120+
121+
def get_for_file(self, input_uri: str) -> BackendInstance:
122+
"""Auto-select backend by file extension. Prefers configured backend, else alphabetical.
123+
124+
Raises:
125+
ValueError: If no backend supports the file extension.
126+
"""
127+
ext = ""
128+
if "." in input_uri:
129+
ext = input_uri.rsplit(".", 1)[-1].lower()
130+
131+
candidate_ids = self._extension_map.get(ext, [])
132+
if not candidate_ids:
133+
raise ValueError(f"No backend supports file extension '.{ext}'")
134+
135+
# 1. Check explicit preference for this extension
136+
preferred_id = self._preferred_backends.get(ext)
137+
if preferred_id and preferred_id in candidate_ids:
138+
return self._backends[preferred_id]
139+
140+
# 2. Otherwise return min by name (first alphabetically)
141+
candidate_instances = [self._backends[bid] for bid in candidate_ids]
142+
return min(candidate_instances, key=lambda b: b.config.name)
143+
144+
def describe_backends(self) -> List[DescribeBackendResponse]:
145+
"""Return backend descriptions sorted alphabetically by name. Frontend uses first as default."""
146+
backends_sorted = sorted(self._backends.values(), key=lambda b: b.config.name)
147+
return [
148+
DescribeBackendResponse(
149+
id=b.config.id,
150+
name=b.config.name,
151+
description=b.config.description,
152+
file_extensions=b.config.file_extensions,
153+
output_formats=b.config.output_formats,
154+
)
155+
for b in backends_sorted
156+
]
157+
158+
@property
159+
def backends(self) -> List[BackendInstance]:
160+
"""Return all backend instances."""
161+
return list(self._backends.values())
162+
163+
def __len__(self) -> int:
164+
return len(self._backends)
165+
166+
def __contains__(self, backend_id: str) -> bool:
167+
return backend_id in self._backends

jupyter_scheduler/backend_utils.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
import logging
2+
from importlib.metadata import entry_points
3+
from typing import Dict, Optional, Type
4+
5+
from jupyter_scheduler.backends import DEFAULT_FALLBACK_BACKEND_ID
6+
from jupyter_scheduler.base_backend import BaseBackend
7+
8+
ENTRY_POINT_GROUP = "jupyter_scheduler.backends"
9+
10+
logger = logging.getLogger(__name__)
11+
12+
13+
def discover_backends(
14+
log: Optional[logging.Logger] = None,
15+
) -> Dict[str, Type[BaseBackend]]:
16+
"""Discover backends registered in the 'jupyter_scheduler.backends' entry point group."""
17+
if log is None:
18+
log = logger
19+
20+
backends: Dict[str, Type[BaseBackend]] = {}
21+
22+
eps = entry_points()
23+
if hasattr(eps, "select"):
24+
backend_eps = eps.select(group=ENTRY_POINT_GROUP)
25+
else:
26+
backend_eps = eps.get(ENTRY_POINT_GROUP, [])
27+
28+
for ep in backend_eps:
29+
try:
30+
backend_class = ep.load()
31+
except ImportError as e:
32+
missing_package = getattr(e, "name", str(e))
33+
log.warning(
34+
f"Unable to load backend '{ep.name}': missing dependency '{missing_package}'. "
35+
f"Install the required package to enable this backend."
36+
)
37+
continue
38+
except Exception as e:
39+
log.warning(f"Unable to load backend '{ep.name}': {e}")
40+
continue
41+
42+
if not hasattr(backend_class, "id"):
43+
log.warning(f"Backend '{ep.name}' does not define 'id' attribute. Skipping.")
44+
continue
45+
46+
backend_id = backend_class.id
47+
backends[backend_id] = backend_class
48+
log.info(f"Registered backend '{backend_id}' ({backend_class.name})")
49+
50+
return backends
51+
52+
53+
def get_legacy_job_backend_id(
54+
available_backends: Dict[str, Type[BaseBackend]],
55+
legacy_job_backend: Optional[str] = None,
56+
) -> str:
57+
"""Get backend ID for routing legacy jobs (UUID-only IDs from pre-3.0)."""
58+
if not available_backends:
59+
raise ValueError("No scheduler backends available.")
60+
61+
if legacy_job_backend and legacy_job_backend in available_backends:
62+
return legacy_job_backend
63+
64+
if DEFAULT_FALLBACK_BACKEND_ID in available_backends:
65+
return DEFAULT_FALLBACK_BACKEND_ID
66+
67+
raise ValueError(
68+
f"No backend for legacy jobs. Set SchedulerApp.legacy_job_backend. "
69+
f"Available: {list(available_backends.keys())}"
70+
)

jupyter_scheduler/backends.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
from typing import Any, Dict, List, Optional
2+
3+
from jupyter_scheduler.base_backend import BaseBackend
4+
from jupyter_scheduler.models import OutputFormat
5+
from jupyter_scheduler.pydantic_v1 import BaseModel, Field
6+
7+
JUPYTER_SERVER_NB_BACKEND_ID = "jupyter_server_nb"
8+
JUPYTER_SERVER_PY_BACKEND_ID = "jupyter_server_py"
9+
DEFAULT_FALLBACK_BACKEND_ID = JUPYTER_SERVER_NB_BACKEND_ID
10+
11+
12+
class BackendConfig(BaseModel):
13+
"""Runtime configuration for an initialized backend instance."""
14+
15+
id: str
16+
name: str
17+
description: str
18+
scheduler_class: str
19+
execution_manager_class: str
20+
database_manager_class: Optional[str] = None
21+
db_url: Optional[str] = None
22+
file_extensions: List[str] = Field(default_factory=list)
23+
output_formats: List[Dict[str, str]] = Field(default_factory=list)
24+
metadata: Optional[Dict[str, Any]] = None
25+
26+
27+
class DescribeBackendResponse(BaseModel):
28+
"""API response model for GET /scheduler/backends.
29+
30+
Backends are returned sorted alphabetically by name for consistent UI ordering.
31+
Use preferred_backends config to control which backend is pre-selected per file extension.
32+
"""
33+
34+
id: str
35+
name: str
36+
description: str
37+
file_extensions: List[str]
38+
output_formats: List[OutputFormat]
39+
40+
class Config:
41+
orm_mode = True
42+
43+
44+
class JupyterServerNotebookBackend(BaseBackend):
45+
"""Built-in backend executing notebooks via nbconvert on the Jupyter server."""
46+
47+
id = JUPYTER_SERVER_NB_BACKEND_ID
48+
name = "Jupyter Server Notebook"
49+
description = "Execute notebooks on the Jupyter server"
50+
scheduler_class = "jupyter_scheduler.scheduler.Scheduler"
51+
execution_manager_class = "jupyter_scheduler.executors.DefaultExecutionManager"
52+
file_extensions = ["ipynb"]
53+
output_formats = [
54+
{"id": "ipynb", "label": "Notebook", "description": "Executed notebook with outputs"},
55+
{"id": "html", "label": "HTML", "description": "HTML export of notebook"},
56+
]
57+
58+
59+
class JupyterServerPythonBackend(BaseBackend):
60+
"""Built-in backend executing Python scripts via subprocess on the Jupyter server."""
61+
62+
id = JUPYTER_SERVER_PY_BACKEND_ID
63+
name = "Jupyter Server Python"
64+
description = "Execute Python scripts on the Jupyter server"
65+
scheduler_class = "jupyter_scheduler.scheduler.Scheduler"
66+
execution_manager_class = "jupyter_scheduler.python_executor.PythonScriptExecutionManager"
67+
file_extensions = ["py"]
68+
output_formats = [
69+
{"id": "stdout", "label": "Output", "description": "Standard output from script"},
70+
{"id": "stderr", "label": "Errors", "description": "Standard error from script"},
71+
{"id": "json", "label": "JSON", "description": "JSON result if script produces one"},
72+
]

0 commit comments

Comments
 (0)