Rocky Job Anatomy

From NIMBioS

Anatomy of a Rocky Job

Setting up a job to run on Rocky starts by creating or uploading your project's files to the project directory within your home directory on Rocky. These files will include the code you've written, any data files needed, and a batch file.


Your Code

Your code is what is submitted and executed on Rocky's compute nodes.
It can be written in any of the languages supported by Rocky environment modules (Lmod).


Your Data

If your job will be processing data, you'll need to upload that data to your project's directory.

Your home directory is shared amongst all compute nodes. No matter which node your job is assigned, it will have access to your data.


Batch Script

The batch script is a shell script that brings everything together by defining job parameters, loading any environment modules needed, and finally executing your code.

Job parameters are defined one per line and start with #SBATCH.
All parameters have default values and are optional but most batch scripts will use some.

You can view all of the sbatch options at:
https://slurm.schedmd.com/sbatch.html

Below is an example batch file using some of the most common options:

my_job.run

#!/bin/bash
#SBATCH --job-name=MY_JOB         ### job name
#SBATCH --output=my_job_%j.out    ### file to store job output
#SBATCH --time=1-00:00:00         ### maximum time limit for job (Days-HH:MM:SS)
#SBATCH --mem-per-cpu=2G          ### amount of memory per cpu to allocate
#SBATCH --cpus-per-task=1         ### number of cpu to allocate
#SBATCH --mail-user=me@test.com   ### email address to notify
#SBATCH --mail-type=END           ### send an email when the job ends

module load R/4.2.1-foss-2022a

Rscript my_code.R


Running Job

Submitting Job

Jobs are submitted using the sbatch command and passed your batch script as a parameter. This will add your job to the queue.

sbatch my_job.run


Watching Job

While your job is in the queue or being executed you may see it's status using the squeue command. If the job is currently running it will show which node(s) it is assigned.

[test_user@rocky7 ~]$ squeue
             JOBID PARTITION       NAME      USER ST      TIME  NODES NODELIST(REASON)
              2947 compute_all    my_job test_use  R      0:05      1 moose1


Cancelling Job

To cancel a job, use the scancel command and pass the JOBID (as returned by squeue).

scancel 2947