Slurm Job Reference

This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to our tutorial.

Common Slurm Commands Overview#

# Submit a job
$ sbatch PATH_TO_SUBMIT_SCRIPT.sh

# View job info for queued or running job
$ squeue -j JOB_ID

# View info about your currently queued and running jobs
$ squeue --me

# Cancel a job
$ scancel JOB_ID

# View information about your RCC account
$ rcctool my:account

# View which queues/Slurm accounts you have access to
$ rcctool my:queues

# View your queued jobs and their estimated start times.
$ squeue --start --me

Detailed Information About Slurm Commands#

# sbatch reference
$ man sbatch

# squeue reference
$ man squeue

# sinfo reference
$ man sinfo

Submitting Jobs#

Use the sbatch command to submit jobs non-interactively (most common use-case):

1	`$ sbatch my_job_script.sh`

Use the srun command to submit jobs interactively or as a substitute for mpirun in a Slurm submit script:

#!/bin/bash

#SBATCH -J 'my_special_job'

srun my_job.sh # (1)

Note: srun is preferred over mpirun, because the former allows for better Slurm scheduler integration.

Common job submission parameters#

The following are the most common parameters used to specify job requirements when running srun or sbatch. All of these are optional, but you will probably want to adjust many in your submit script.

Option	Default	Explanation
`-J [jobname]`	Script name	Optional job name
`--mail-type=[type]`	`NONE`	Comma separated list: `BEGIN`, `END`, `FAIL`, `REQUEUE`, or `ALL`
`-n [numtasks]`	`1`	Total number of tasks to run (note: lowercase n)
`-A [slurm_account_name]`	`genacc_q`	Slurm account (queue/partition) into which to submit job (e.g. backfill, genacc_q)
`-t D-HH:MM:SS`	varies by Slurm account (see below)	Wall clock limit
`--mem-per-cpu`	varies by Slurm account (typically `3.9G`)	Memory requested per node in MiB (add `G` to specify GiB)
`-o [outfile]`	`slurm-%j.out`	File to write standard output to (STDOUT)
`-e [error_file]`	combined with `-o`	Separate error output (STDERR) from standard output.
`-N [numnodes]`	`1`	Number of nodes
`-C [constraints]`	none	Run only nodes with specified features (e.g. `-C YEAR2014`). See below
`--exclusive`	disabled	Reserve entire node(s) for job*
`--gpus=[type:num]`	none	Specify a number of GPUs to reserve in addition to CPUs

Note that these are only a few of the many available Slurm options. Refer to the sbatch manpage for a complete list.

Connecting to running batch jobs#

You can connect to any currently running batch job by invoking a shell and using the --overlap option with srun. Syntax is below:

1 2	`# You can substitute a different shell if you need to (e.g., /bin/tcsh) $ srun --pty --overlap --jobid [YOUR-JOBID] /bin/bash`

If your job is allocated to multiple nodes, Slurm will connect to the first node that the job is running on.

Constraint reference#

The HPC consists of nodes with various models of AMD an Intel processors. If it is important that your job run only on a single type or generation of processor, you can specify feature constraints using the -C parameter.

To retrieve a list of available feature, login to the HPC and run sinfo -o %f:

$ sinfo -o %f
AVAIL_FEATURES
(null)
YEAR2017,intel
YEAR2022,amd
YEAR2024,amd
YEAR2019,intel
YEAR2018,intel
YEAR2012,amd
YEAR2014,intel
YEAR2013,intel
YEAR2012,intel
YEAR2015,intel
YEAR2012,intel,crowdstrike
YEAR2012,intel,nocrowdstrike
YEAR2010,amd
YEAR2024,intel
YEAR2023,intel

To limit by feature, specify the -C parameter in your submit script (or as a command-line option):

1	`#SBATCH -C "YEAR2012,intel"`

You can use boolean AND (&) and OR (|) operators in your constraint parameter. For example:

1 2	`# Either Year 2012 or Year 2013 #SBATCH -C "YEAR2012,intel\|YEAR2013,intel"`

Common Slurm environment variables#

Variable	Description
`SLURM_JOBID`	Auto assigned Job ID
`SLURM_SUBMIT_DIR`	Directory/folder from which the job was submitted
`SLURM_SUBMIT_HOST`	The name of the server from which the job was submitted (e.g.; `h22-login-24`)
`SLURM_JOB_NODELIST`	The names of the node(s) that were allocated for this job was
`SLURM_ARRAY_TASK_ID`	Task ID within job array (if using job arrays)
`SLURM_JOB_CPUS_PER_NODE`	CPU cores per node allocated to job
`SLURM_NNODES`	Number of nodes allocated to job

These are the most commonly used Slurm environment variables. A comprehensive list of environment variables is available at the official Slurm documentation.

The following script shows an example job submit script that reports the resources that Slurm has allocated:

#!/bin/bash

#SBATCH -n 1         # Request one process
#SBATCH -N 1         # Request a single node
#SBATCH -t 00:00:02  # Request two minutes of walltime

echo -e "\nHello from $SLURM_JOB_NODELIST"
echo -e "\nI'm using $SLURM_JOBS_CPUS_PER_NODE cores"
echo -e "\nI was submitted from host $SLURM_SUBMIT_HOST in directory: $SLURM_SUBMIT_DIR"

Interactive jobs#

If you need to run a job interactively, use the srun command with the --pty command-line switch:

# Ask for 8 cores and 30 minutes for an interactive job on the login node.
$ srun --pty -t 30:00 -n 8 -A genacc_q /bin/bash # (1)

# Wait for resources...

[abc12a@hpc-m30-1-1]$ hostname
hpc-m30-1-1

# ...Work interactively...

# When you are finished, type 'exit' to end your interactive job.
[abc12a@hpc-m30-1-1]$ exit

Note: srun accepts the same parameters as sbatch

Job arrays#

Slurm allows you to submit a number of near identical jobs simultaneously in the form of a job array. Your workload may be a good candidate for this if you have a number of identical jobs with different input that differs only by some sort of index.

A good use of job arrays would be if your jobs use the same parameters and code, but with each has different input/data file.

Syntax:

1	`#SBATCH -a [##-##](%##)`

##-## refers to the start and end of your index, and the optional (%##) refers to the number of tasks that are allowed concurrently, if you wish to limit that for any reason.

Below is an example:

$ sbatch -a 0-63%4 -N1 ./test.sh  # Submit an array job with 64 jobs, allowing 4 job to run concurrently

# View contents of ./test.sh
$ cat test.sh
echo array_id = $SLURM_ARRAY_JOB_ID job_id = $SLURM_JOBID task_id = $SLURM_ARRAY_TASK_ID 
sleep 30

# Check job queue
$ squeue --me
  JOBID         PARTITION  NAME      USER    ST       TIME  NODES NODELIST(REASON)
  123456_[4-63%4]  genacc_q   test.sh   abc12a  PD       0:00      1 (JobArrayTaskLimit)
            167_0  genacc_q   test.sh   abc12a   R       0:03      1 hpc-tc-1
            167_1  genacc_q   test.sh   abc12a   R       0:03      1 hpc-tc-1
            167_2  genacc_q   test.sh   abc12a   R       0:03      1 hpc-tc-1
            167_3  genacc_q   test.sh   abc12a   R       0:03      1 hpc-tc-1

Note

Tasks in job arrays use the JOB_ID format: [JOBID]_[TASK_ID]. Each task in the array will also generate its own output file unless you specify a custom output in your submission script via the -o parameter. More information is available in the official Slurm documentation

Job dependencies#

Slurm supports job dependencies. You can submit jobs that will be deferred until other jobs have either completed or terminated in a failed state. This allows you to break your workflow/task down into smaller atomic steps. Use the -d option to specify dependencies:

1	`#SBATCH -d <dependency_list>`

The dependency_list parameter syntax takes the form of dependency_type:job_id. dependency_type can be any of the following:

Dependency Type Syntax	Example	Explanation
`after:JOB_ID`	`after:123456:123457`	Job will start after the specified job(s) have begun execution
`afterany:JOB_ID`	`afterany:123456:123457`	Job will start after the specified job(s) have finished (in any state)
`afternotok:JOB_ID`	`afternotok:123456`	Job will start after the specified job(s) have terminated in a failed state
`afterok:JOB_ID`	`afterok:123456`	Job will start after the specified job(s) have executed and terminated successfully

More examples:

# This job will start only after JOB #123456 has started
#SBATCH -d after:123456

# Multiple job dependencies; this job will start only after Jobs #123456 and #123457 have started
#SBATCH -d after:123456:123457

# Multiple job dependency types; this job will start only after jobs #123456 and #123457 have started
# and job #123458 has ended successfully
#SBATCH -d after:123456:123457,afterok:123458

Note

These are only a subset of the available dependency options. For a full reference, refer to the official Slurm documentation.

Singleton jobs#

A simpler way to ensure jobs will only run one-at-a-time it o use the -d singleton dependency syntax. This ensures that only one job with a given name will be running at any given time. You need to use the --name parameter for this to work. The same logic is used as -d afterany:JOBID; the job will begin only after any jobs with the same name have terminated in any state.

1 2	`#SBATCH -n "my_job" #SBATCH -d singleton`

Tips#

Default number of cores per node#

If you use the -N option to specify the number of nodes, Slurm will allocate only one core per node:

1
2
3

...
#SBATCH -N 4  # Without the --ntasks-per-node parameter, this will allocate 4 nodes, but only one core per node
...

To use more than one core per node, use the --ntasks-per-node option; e.g.:

1 2	`#SBATCH -N 4 #SBATCH --ntasks-per-node=8 # Allocate 8 cores on 4 nodes (8x4=32 cores total)`

If you need to reserve complete nodes, use the --exclusive option:

1 2	`#SBATCH -N 2 #SBATCH --exclusive`

Warning

The above example will reserve two complete nodes in the cluster, but this job's wait time in-queue will likely be much higher than if you do not use this option. For further explanation, refer to our job resource planning guide