Slurm Job Reference
This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to our tutorial.
Common Slurm Commands Overview#
Detailed Information About Slurm Commands#
sbatch command to submit jobs non-interactively (most common use-case):
srun command to submit jobs interactively or as a substitute for
mpirun in a Slurm submit script:
srunis preferred over
mpirun, because the former allows for better Slurm scheduler integration.
Common job submission parameters#
The following are the most common parameters used to specify job requirements when running
All of these are optional, but you will probably want to adjust many in your submit script.
||Script name||Optional job name|
||Comma separated list:
||Total number of tasks to run (note: lowercase n)|
||Slurm account (queue/partition) into which to submit job (e.g. backfill, genacc_q)|
||varies by Slurm account (see below)||Wall clock limit|
||varies by Slurm account (typically
||Memory requested per node in MiB (add
||File to write standard output to (STDOUT)|
||Separate error output (STDERR) from standard output.|
||Number of nodes|
||none||Run only nodes with specified features (e.g.
||disabled||Reserve entire node(s) for job*|
||none||Specify a number of GPUs to reserve in addition to CPUs|
Note that these are only a few of the many available Slurm options. Refer to the sbatch manpage for a complete list.
The HPC consists of nodes with various models of AMD an Intel processors. If it is important that your job run only on
a single type or generation of processor, you can specify feature constraints using the
To retrieve a list of available feature, login to the HPC and run
sinfo -o %f:
To limit by fy feature, specify the
-C parameter in your submit script (or as a command-line option):
You can use boolean AND (
&) and OR (
|) operators in your constraint parameter. For example:
Common Slurm environment variables#
||Auto assigned Job ID|
||Directory/folder from which the job was submitted|
||The name of the server from which the job was submitted (e.g.;
||The names of the node(s) that were allocated for this job was|
||Task ID within job array (if using job arrays)|
||CPU cores per node allocated to job|
||Number of nodes allocated to job|
These are the most commonly used Slurm environment variables. A comprehensive list of environment variables is available at the official Slurm documentation.
The following script shows an example job submit script that simply reports the resources that Slurm has allocated:
If you need to run a job interactively, use the
srun command with the
--pty command-line switch:
srunaccepts the same parameters as
Slurm allows you to submit a number of near identical jobs simultaneously in the form of a job array. Your workload may be a good candidate for this if you have a number of identical jobs with different input that differs only by some sort of index.
A good use of job arrays would be if your jobs use the same parameters and code, but with each has different input/data file.
##-## refers to the start and end of your index, and the optional
(%##) refers to the number of tasks that are allowed concurrently, if
you wish to limit that for any reason.
Below is an example:
Tasks in job arrays use the JOB_ID format: [JOBID]_[TASK_ID]. Each task in the array will also generate its own
output file unless you specify a custom output in your submission script via the
-o parameter. More information
is available in the official Slurm documentation
Slurm supports job dependencies. You can submit jobs that will be deferred until other jobs have either completed or
terminated in a failed state. This allows you to break your workflow/task down into smaller atomic steps. Use the
option to specify dependencies:
dependency_list parameter syntax takes the form of
dependency_type:job_id. dependency_type can be any of the
|Dependency Type Syntax||Example||Explanation|
||Job will start after the specified job(s) have begun execution|
||Job will start after the specified job(s) have finished (in any state)|
||Job will start after the specified job(s) have terminated in a failed state|
||Job will start after the specified job(s) have executed and terminated successfully|
These are only a subset of the available dependency options. For a full reference, refer to the official Slurm documentation.
A simpler way to ensure jobs will only run one-at-a-time it o use the
-d singleton dependency syntax. This ensures that
only one job with a given name will be running at any given time. You need to use the
--name parameter for this to work.
The same logic is used as
-d afterany:JOBID; the job will begin only after any jobs with the same name have terminated
in any state.
Default number of cores per node#
If you use the
-N option to specify the number of nodes, Slurm will allocate only one core per node:
To use more than one core per node, use the
--ntasks-per-node option; e.g.:
If you need to reserve complete nodes, use the
The above example will reserve two complete nodes in the cluster, but this job's wait time in-queue will likely be much higher than if you do not use this option. For further explanation, refer to our job resource planning guide
- Information about completed jobs is available via the
- Tutorials on the official Slurm website
sbatchparameter options are listed on the official Slurm website