Rocky Slurm

From NIMBioS
Revision as of 18:18, 25 August 2022 by Jondale (talk | contribs) (Created page with "== Slurm == Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocate...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Slurm

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

You can learn more information at the Slurm website.


Commands

Slurm has many commands but the most common ones you will use to run jobs are sbatch and srun.

srun A blocking command that submits a job in real time to the cluster.
sbatch A non-blocking command that submits a job to the queue to be run as resources allow.
squeue Shows information about queued jobs.
scancel Stop queued jobs.