Difference between revisions of "Rocky Slurm"
Line 8: | Line 8: | ||
= Commands = | = Commands = | ||
Slurm has many commands. Here are a few | Slurm has many commands. Here are a few you'll use when submitting jobs: | ||
{| class='wikitable' | {| class='wikitable' | ||
Line 20: | Line 20: | ||
| '''scancel''' || Cancel/Stop a job. | | '''scancel''' || Cancel/Stop a job. | ||
|} | |} | ||
= Submitting Jobs = | = Submitting Jobs = |
Revision as of 19:47, 25 August 2022
Slurm
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
You can learn more information at the Slurm website.
Commands
Slurm has many commands. Here are a few you'll use when submitting jobs:
srun | A blocking command that submits a job in real time to the cluster. |
sbatch | A non-blocking command that queues a job to be run as resources allow. |
squeue | Shows information about queued jobs. |
scancel | Cancel/Stop a job. |
Submitting Jobs
The most common and preferred way to submit jobs is to create a shell script and then run it with sbatch
.
myslurmjob.sh
#!/bin/bash #SBATCH --job-name=myjob #SBATCH --output=output.log srun myprogram
In the script above you can see that we have the script interpreter in the first line, followed by some parameters to pass to the sbatch
command, and then finally we use srun
to run our program. The parameters we set are simply a name for the job and a name for the log file to put the job's output.
Now that our script is made we can queue it up using the sbatch
command.
sbatch myslurmjob.sh