Difference between revisions of "Rocky Slurm"
From NIMBioS
(Created page with "== Slurm == Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocate...") |
(No difference)
|
Revision as of 18:18, 25 August 2022
Slurm
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
You can learn more information at the Slurm website.
Commands
Slurm has many commands but the most common ones you will use to run jobs are sbatch
and srun
.
srun | A blocking command that submits a job in real time to the cluster. |
sbatch | A non-blocking command that submits a job to the queue to be run as resources allow. |
squeue | Shows information about queued jobs. |
scancel | Stop queued jobs. |