Daylily is an operator-focused framework for standing up short-lived AWS ParallelCluster environments around durable reference data and repeatable workflow launch paths. The goal is simple: make large Slurm-backed bioinformatics clusters easy to create, use, inspect, and destroy without turning the cluster itself into a permanent pet.
Daylily assumes the durable assets are the data, references, manifests, and workflow definitions - not the running compute fleet. That pushes the design toward:
- preflight validation before any expensive mutation happens
- region-scoped reference buckets that survive cluster turnover
- FSx for Lustre for shared cluster-time performance
- lightweight, repeatable head-node bootstrap
- explicit export and delete workflows once the run is complete
This keeps the operator workflow close to "create, validate, run, export, tear down" instead of "tune and babysit a permanent cluster".
The Daylily stack has three layers:
- Control plane:
daylily_ecvalidates prerequisites, renders cluster YAML, applies spot pricing, creates the cluster, and records state. - Data plane: a region-specific S3 bucket whose name includes
omics-analysisis exposed through FSx for Lustre so references and staged data are shared across nodes. - Workflow plane: repository metadata in
../config/daylily_available_repositories.yamltells the head node what workflow repos exist, where to clone them from, and which default ref to use.
The intended operator loop is:
- Prepare the AWS identity, key pair, and reference bucket for a region.
- Run
daylily-ec preflightto catch quota, IAM, or bucket problems before provisioning. - Create the cluster and let Daylily bootstrap the head node.
- Stage sample metadata and inputs from a laptop or directly on the head node.
- Launch a workflow through the head node helpers and monitor the run in Slurm and tmux.
- Export results to S3, check for drift if needed, and delete the cluster.
That operational sequence is why Daylily ships a Python CLI as the canonical operator interface, with bin/ helpers retained only as compatibility wrappers where practical.
The repo already carries a small workflow registry:
daylily-omics-analysis: primary whole-genome and multiomics workflowsrna-seq-star-deseq2: RNA-seq alignment and differential expression workflowsdaylily-sarek: a Sarek-based workflow entry
Those entries live in ../config/daylily_available_repositories.yaml, and day-clone uses them on the head node.
Daylily is opinionated about cost visibility:
- preflight can stop before a bad cluster launch
- budgets and heartbeat notifications are part of the lifecycle model
- the CLI includes raw pricing inspection helpers
- the repo keeps benchmark and cost context alongside the operator docs
Illustrative artifacts already shipped in the repo:
The shared filesystem layout is part of the operator value proposition. References, staged inputs, workflow repos, and analysis results land in predictable places so the cluster can stay ephemeral while the run outputs remain easy to export and inspect.
The repo keeps benchmark notes under benchmarks/. These are reference material, not the operator quickstart:
benchmarks/FS_performance.mdbenchmarks/aligner_benchmarks.mdbenchmarks/deduplication_benchmarks.mdbenchmarks/snv_calling.mdbenchmarks/sv_calling.md
- quickest_start.md for the install and create flow
- operations.md for the day-2 operator workflow
- archive/README.md for historical material that is preserved but no longer canonical


