Slurm Job Reference
This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to our tutorial.
Common Slurm Commands Overview#
Detailed Information About Slurm Commands#
Submitting Jobs#
Use the sbatch command to submit jobs non-interactively (most common use-case):
Use the srun command to submit jobs interactively or as a substitute for mpirun in a Slurm submit script:
- Note:
srunis preferred overmpirun, because the former allows for better Slurm scheduler integration.
Common job submission parameters#
The following are the most common parameters used to specify job requirements when running srun or sbatch.
All of these are optional, but you will probably want to adjust many in your submit script.
| Option | Default | Explanation |
|---|---|---|
-J [jobname] |
Script name | Optional job name |
--mail-type=[type] |
NONE |
Comma separated list: BEGIN, END, FAIL, REQUEUE, or ALL |
-n [numtasks] |
1 |
Total number of tasks to run (note: lowercase n) |
-A [slurm_account_name] |
genacc_q |
Slurm account (queue/partition) into which to submit job (e.g. backfill, genacc_q) |
-t D-HH:MM:SS |
varies by Slurm account (see below) | Wall clock limit |
--mem-per-cpu |
varies by Slurm account (typically 3.9G) |
Memory requested per node in MiB (add G to specify GiB) |
-o [outfile] |
slurm-%j.out |
File to write standard output to (STDOUT) |
-e [error_file] |
combined with -o |
Separate error output (STDERR) from standard output. |
-N [numnodes] |
1 |
Number of nodes |
-C [constraints] |
none | Run only nodes with specified features (e.g. -C YEAR2014). See below |
--exclusive |
disabled | Reserve entire node(s) for job* |
--gpus=[type:num] |
none | Specify a number of GPUs to reserve in addition to CPUs |
Note that these are only a few of the many available Slurm options. Refer to the sbatch manpage for a complete list.
Connecting to running batch jobs#
You can connect to any currently running batch job by invoking a shell and using the --overlap option with srun.
Syntax is below:
If your job is allocated to multiple nodes, Slurm will connect to the first node that the job is running on.
Constraint reference#
The HPC consists of nodes with various models of AMD an Intel processors. If it is important that your job run only on
a single type or generation of processor, you can specify feature constraints using the -C parameter.
To retrieve a list of available feature, login to the HPC and run sinfo -o %f:
To limit by feature, specify the -C parameter in your submit script (or as a command-line option):
You can use boolean AND (&) and OR (|) operators in your constraint parameter. For example:
Common Slurm environment variables#
| Variable | Description |
|---|---|
SLURM_JOBID |
Auto assigned Job ID |
SLURM_SUBMIT_DIR |
Directory/folder from which the job was submitted |
SLURM_SUBMIT_HOST |
The name of the server from which the job was submitted (e.g.; h22-login-24) |
SLURM_JOB_NODELIST |
The names of the node(s) that were allocated for this job was |
SLURM_ARRAY_TASK_ID |
Task ID within job array (if using job arrays) |
SLURM_JOB_CPUS_PER_NODE |
CPU cores per node allocated to job |
SLURM_NNODES |
Number of nodes allocated to job |
These are the most commonly used Slurm environment variables. A comprehensive list of environment variables is available at the official Slurm documentation.
The following script shows an example job submit script that reports the resources that Slurm has allocated:
Interactive jobs#
If you need to run a job interactively, use the srun command with the --pty command-line switch:
- Note:
srunaccepts the same parameters assbatch
Job arrays#
Slurm allows you to submit a number of near identical jobs simultaneously in the form of a job array. Your workload may be a good candidate for this if you have a number of identical jobs with different input that differs only by some sort of index.
A good use of job arrays would be if your jobs use the same parameters and code, but with each has different input/data file.
Syntax:
##-## refers to the start and end of your index, and the optional (%##) refers to the number of tasks that are allowed concurrently, if
you wish to limit that for any reason.
Below is an example:
Note
Tasks in job arrays use the JOB_ID format: [JOBID]_[TASK_ID]. Each task in the array will also generate its own
output file unless you specify a custom output in your submission script via the -o parameter. More information
is available in the official Slurm documentation
Job dependencies#
Slurm supports job dependencies. You can submit jobs that will be deferred until other jobs have either completed or
terminated in a failed state. This allows you to break your workflow/task down into smaller atomic steps. Use the -d
option to specify dependencies:
The dependency_list parameter syntax takes the form of dependency_type:job_id. dependency_type can be any of the
following:
| Dependency Type Syntax | Example | Explanation |
|---|---|---|
after:JOB_ID |
after:123456:123457 |
Job will start after the specified job(s) have begun execution |
afterany:JOB_ID |
afterany:123456:123457 |
Job will start after the specified job(s) have finished (in any state) |
afternotok:JOB_ID |
afternotok:123456 |
Job will start after the specified job(s) have terminated in a failed state |
afterok:JOB_ID |
afterok:123456 |
Job will start after the specified job(s) have executed and terminated successfully |
More examples:
Note
These are only a subset of the available dependency options. For a full reference, refer to the official Slurm documentation.
Singleton jobs#
A simpler way to ensure jobs will only run one-at-a-time it o use the -d singleton dependency syntax. This ensures that
only one job with a given name will be running at any given time. You need to use the --name parameter for this to work.
The same logic is used as -d afterany:JOBID; the job will begin only after any jobs with the same name have terminated
in any state.
Tips#
Default number of cores per node#
If you use the -N option to specify the number of nodes, Slurm will allocate only one core per node:
To use more than one core per node, use the --ntasks-per-node option; e.g.:
If you need to reserve complete nodes, use the --exclusive option:
Warning
The above example will reserve two complete nodes in the cluster, but this job's wait time in-queue will likely be much higher than if you do not use this option. For further explanation, refer to our job resource planning guide
Further reading#
- Information about completed jobs is available via the
sstatandseffcommands. - All
sbatchparameter options are listed on the official Slurm website