What is the preferred way to use datajoint on a cluster? #1277
Replies: 2 comments
-
|
This is a great question! This is solved by the full platform that uses the DataJoint Python library. Please feel free to reach out to set this up. https://github.com/datajoint/datajoint-specs/blob/main/SPECS_2_0.md#open-source-development-and-the-datajoint-standard |
Beta Was this translation helpful? Give feedback.
-
DataJoint 2.0 UpdateDataJoint 2.0 introduces Jobs 2.0, a redesigned job management system that makes cluster usage more straightforward. Key Features for Cluster Usage1. Automatic Job Discovery # Each worker discovers and reserves jobs automatically
MyTable.populate() # Workers compete fairly for pending jobs2. Priority-Based Scheduling # Refresh jobs with priority (lower = more urgent)
MyTable.jobs.refresh(priority=1) # High priority jobs
MyTable.jobs.refresh(priority=10) # Lower priority jobs3. Delayed Scheduling # Schedule jobs to start later
MyTable.jobs.refresh(delay=3600) # Available in 1 hour4. Orphan Detection # Clean up jobs from crashed workers
MyTable.jobs.refresh(orphan_timeout=3600) # Re-pend jobs orphaned > 1 hour5. Progress Monitoring MyTable.jobs.progress()
# {"pending": 100, "reserved": 5, "success": 50, "error": 2, "ignore": 0, "total": 157}Cluster Deployment Pattern# On each cluster node, simply run:
import datajoint as dj
@schema
class MyComputation(dj.Computed):
...
# Each worker will:
# 1. Reserve available pending jobs
# 2. Process them
# 3. Mark complete or error
# 4. Repeat until no jobs remain
MyComputation.populate()No need to manually specify process counts per node - each worker independently pulls from the shared job queue. The database handles coordination via atomic job reservation. Configuration# Optional tuning via config
dj.config.jobs.default_priority = 5
dj.config.jobs.stale_timeout = 86400 # Clean up old jobs after 24hThis approach scales naturally - just start more workers on more nodes, and they will coordinate automatically through the job queue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently I can submit a job to multiple nodes in a cluster (where all nodes have access to the MySQL server, have the same environment etc.) using a script that calls populate with job reservation, but this feels somewhat suboptimal because I would have to know how many cores each node has and manually specify the number of processes requested for each one
Beta Was this translation helpful? Give feedback.
All reactions