Skip to content

srun on a single node #41

@soaringxmc

Description

@soaringxmc

Hi, @m-kurz @b-fg

Currently, smartflow can run on cpu and gpu clusters. However, we can only use local + mpirun to run on a single node. It is highly desirable if we can also use slurm+srun on a single node of a cluster, where database and cfd runs are located on the same node.

In the code, if the database is created, echo experiment would not be able to start. However, if the database is not created by commenting the database lines, echo experiment would not be able to start. Do you have any ideas for solving the issue?

import smartsim
import os
from smartsim import Experiment
from smartsim.database.orchestrator import Orchestrator

exp = Experiment('envs', launcher='slurm')

db = exp.create_database(
    interface='lo',
    # Set the database to run on the current allocation
    db_nodes=1,
    batch=False,  # Important: This tells SmartSim to use the current allocation
)
exp.start(db)

models = []
for i in range(1):
    # Define run arguments
    run_args = {
        # 'cpus-per-task': 8,
        # 'gpus-per-task': 1,
    }
        
    run_settings = exp.create_run_settings(
        exe="echo",
        exe_args="Hello World",
        run_command="srun",
        run_args=run_args,
    )
    run_settings.set_tasks(1)

    model = exp.create_model(f"env_{i}", run_settings)
    exp.start(model, block=False, summary=False)
    models.append(model)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions