What is the preferred way to use datajoint on a cluster? #1277

zhenruiliao · 2025-10-15T03:39:52Z

zhenruiliao
Oct 15, 2025

Currently I can submit a job to multiple nodes in a cluster (where all nodes have access to the MySQL server, have the same environment etc.) using a script that calls populate with job reservation, but this feels somewhat suboptimal because I would have to know how many cores each node has and manually specify the number of processes requested for each one

dimitri-yatsenko · 2025-10-15T15:48:51Z

dimitri-yatsenko
Oct 15, 2025
Maintainer

This is a great question! This is solved by the full platform that uses the DataJoint Python library. Please feel free to reach out to set this up. https://github.com/datajoint/datajoint-specs/blob/main/SPECS_2_0.md#open-source-development-and-the-datajoint-standard

0 replies

dimitri-yatsenko · 2026-01-08T19:18:42Z

dimitri-yatsenko
Jan 8, 2026
Maintainer

DataJoint 2.0 Update

DataJoint 2.0 introduces Jobs 2.0, a redesigned job management system that makes cluster usage more straightforward.

Key Features for Cluster Usage

1. Automatic Job Discovery

# Each worker discovers and reserves jobs automatically
MyTable.populate()  # Workers compete fairly for pending jobs

2. Priority-Based Scheduling

# Refresh jobs with priority (lower = more urgent)
MyTable.jobs.refresh(priority=1)  # High priority jobs
MyTable.jobs.refresh(priority=10)  # Lower priority jobs

3. Delayed Scheduling

# Schedule jobs to start later
MyTable.jobs.refresh(delay=3600)  # Available in 1 hour

4. Orphan Detection

# Clean up jobs from crashed workers
MyTable.jobs.refresh(orphan_timeout=3600)  # Re-pend jobs orphaned > 1 hour

5. Progress Monitoring

MyTable.jobs.progress()
# {"pending": 100, "reserved": 5, "success": 50, "error": 2, "ignore": 0, "total": 157}

Cluster Deployment Pattern

# On each cluster node, simply run:
import datajoint as dj

@schema
class MyComputation(dj.Computed):
    ...

# Each worker will:
# 1. Reserve available pending jobs
# 2. Process them
# 3. Mark complete or error
# 4. Repeat until no jobs remain
MyComputation.populate()

No need to manually specify process counts per node - each worker independently pulls from the shared job queue. The database handles coordination via atomic job reservation.

Configuration

# Optional tuning via config
dj.config.jobs.default_priority = 5
dj.config.jobs.stale_timeout = 86400  # Clean up old jobs after 24h

This approach scales naturally - just start more workers on more nodes, and they will coordinate automatically through the job queue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the preferred way to use datajoint on a cluster? #1277

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What is the preferred way to use datajoint on a cluster? #1277

Uh oh!

zhenruiliao Oct 15, 2025

Replies: 2 comments

Uh oh!

dimitri-yatsenko Oct 15, 2025 Maintainer

Uh oh!

dimitri-yatsenko Jan 8, 2026 Maintainer

DataJoint 2.0 Update

Key Features for Cluster Usage

Cluster Deployment Pattern

Configuration

zhenruiliao
Oct 15, 2025

dimitri-yatsenko
Oct 15, 2025
Maintainer

dimitri-yatsenko
Jan 8, 2026
Maintainer