Slurm Job Reference
This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to our tutorial.
Common Slurm Commands Overview#
Detailed Information About Slurm Commands#
Submitting Jobs#
Use the sbatch
command to submit jobs non-interactively (most common use-case):
Use the srun
command to submit jobs interactively or as a substitute for mpirun
in a Slurm submit script:
- Note:
srun
is preferred overmpirun
, because the former allows for better Slurm scheduler integration.
Common job submission parameters#
The following are the most common parameters used to specify job requirements when running srun
or sbatch
.
All of these are optional, but you will probably want to adjust many in your submit script.
Option | Default | Explanation |
---|---|---|
-J [jobname] |
Script name | Optional job name |
--mail-type=[type] |
NONE |
Comma separated list: BEGIN , END , FAIL , REQUEUE , or ALL |
-n [numtasks] |
1 |
Total number of tasks to run (note: lowercase n) |
-A [slurm_account_name] |
genacc_q |
Slurm account (queue/partition) into which to submit job (e.g. backfill, genacc_q) |
-t D-HH:MM:SS |
varies by Slurm account (see below) | Wall clock limit |
--mem-per-cpu |
varies by Slurm account (typically 3.9G ) |
Memory requested per node in MiB (add G to specify GiB) |
-o [outfile] |
slurm-%j.out |
File to write standard output to (STDOUT) |
-e [error_file] |
combined with -o |
Separate error output (STDERR) from standard output. |
-N [numnodes] |
1 |
Number of nodes |
-C [constraints] |
none | Run only nodes with specified features (e.g. -C YEAR2014 ). See below |
--exclusive |
disabled | Reserve entire node(s) for job* |
--gpus=[type:num] |
none | Specify a number of GPUs to reserve in addition to CPUs |
Note that these are only a few of the many available Slurm options. Refer to the sbatch manpage for a complete list.
Connecting to running batch jobs#
You can connect to any currently running batch job by invoking a shell and using the --overlap
option with srun
.
Syntax is below:
If your job is allocated to multiple nodes, Slurm will connect to the first node that the job is running on.
Constraint reference#
The HPC consists of nodes with various models of AMD an Intel processors. If it is important that your job run only on
a single type or generation of processor, you can specify feature constraints using the -C
parameter.
To retrieve a list of available feature, login to the HPC and run sinfo -o %f
:
To limit by feature, specify the -C
parameter in your submit script (or as a command-line option):
You can use boolean AND (&
) and OR (|
) operators in your constraint parameter. For example:
Common Slurm environment variables#
Variable | Description |
---|---|
SLURM_JOBID |
Auto assigned Job ID |
SLURM_SUBMIT_DIR |
Directory/folder from which the job was submitted |
SLURM_SUBMIT_HOST |
The name of the server from which the job was submitted (e.g.; h22-login-24 ) |
SLURM_JOB_NODELIST |
The names of the node(s) that were allocated for this job was |
SLURM_ARRAY_TASK_ID |
Task ID within job array (if using job arrays) |
SLURM_JOB_CPUS_PER_NODE |
CPU cores per node allocated to job |
SLURM_NNODES |
Number of nodes allocated to job |
These are the most commonly used Slurm environment variables. A comprehensive list of environment variables is available at the official Slurm documentation.
The following script shows an example job submit script that reports the resources that Slurm has allocated:
Interactive jobs#
If you need to run a job interactively, use the srun
command with the --pty
command-line switch:
- Note:
srun
accepts the same parameters assbatch
Job arrays#
Slurm allows you to submit a number of near identical jobs simultaneously in the form of a job array. Your workload may be a good candidate for this if you have a number of identical jobs with different input that differs only by some sort of index.
A good use of job arrays would be if your jobs use the same parameters and code, but with each has different input/data file.
Syntax:
##-##
refers to the start and end of your index, and the optional (%##)
refers to the number of tasks that are allowed concurrently, if
you wish to limit that for any reason.
Below is an example:
Note
Tasks in job arrays use the JOB_ID format: [JOBID]_[TASK_ID]. Each task in the array will also generate its own
output file unless you specify a custom output in your submission script via the -o
parameter. More information
is available in the official Slurm documentation
Job dependencies#
Slurm supports job dependencies. You can submit jobs that will be deferred until other jobs have either completed or
terminated in a failed state. This allows you to break your workflow/task down into smaller atomic steps. Use the -d
option to specify dependencies:
The dependency_list
parameter syntax takes the form of dependency_type:job_id
. dependency_type can be any of the
following:
Dependency Type Syntax | Example | Explanation |
---|---|---|
after:JOB_ID |
after:123456:123457 |
Job will start after the specified job(s) have begun execution |
afterany:JOB_ID |
afterany:123456:123457 |
Job will start after the specified job(s) have finished (in any state) |
afternotok:JOB_ID |
afternotok:123456 |
Job will start after the specified job(s) have terminated in a failed state |
afterok:JOB_ID |
afterok:123456 |
Job will start after the specified job(s) have executed and terminated successfully |
More examples:
Note
These are only a subset of the available dependency options. For a full reference, refer to the official Slurm documentation.
Singleton jobs#
A simpler way to ensure jobs will only run one-at-a-time it o use the -d singleton
dependency syntax. This ensures that
only one job with a given name will be running at any given time. You need to use the --name
parameter for this to work.
The same logic is used as -d afterany:JOBID
; the job will begin only after any jobs with the same name have terminated
in any state.
Tips#
Default number of cores per node#
If you use the -N
option to specify the number of nodes, Slurm will allocate only one core per node:
To use more than one core per node, use the --ntasks-per-node
option; e.g.:
If you need to reserve complete nodes, use the --exclusive
option:
Warning
The above example will reserve two complete nodes in the cluster, but this job's wait time in-queue will likely be much higher than if you do not use this option. For further explanation, refer to our job resource planning guide
Further reading#
- Information about completed jobs is available via the
sstat
andseff
commands. - All
sbatch
parameter options are listed on the official Slurm website