Skip to content

Submitting GPU jobs

The HPC supports GPU jobs. The HPC includes an increasing number of compute nodes equipped with GPU hardware.

GPU resources available on the HPC#

GPU resources on the HPC are available in certain access Slurm accounts (partitions):

GPU Model Number Slurm accounts
NVIDIA GeForce GTX 1080 Ti 12 (2x per node) backfill2 (4 hour time limit; see below)
NVIDIA RTX A4000 26 (2x per node) Owner accounts only
NVIDIA RTX A4500 32 (4x per node) backfill2 (4 hour time limit; see below); gpu_q, and owner accounts
NVIDIA H100 Tensor Core GPU 1 (1x per node) Limited access for pilot programs; Request access

In addition, if your department, lab, or group has purchased GPU resources, they will be available on your owner-based Slurm account.

Free GPU jobs longer than four hours#

If you want to run jobs that are longer than four hours in our general access queues, we are accepting requests on a case-by-case basis. Contact us to request access.

If you have already been approved and enrolled in the genacc_gpu_users group, submit your longer running GPU jobs to the gpu_q:

#SBATCH -A gpu_q

If your research group has purchased dedicated GPU resources on our system, then there is no need to request access. If you submit your job to your owner-based queue, you will be able to run GPU jobs for longer than four hours.

Submitting GPU Jobs#

Via Open OnDemand#

You can submit graphical jobs via Open OnDemand to run on GPU nodes. Note that this may increase your job wait time in the queue. Follow the instructions for submitting a job, and specify 1 or more GPUs in the "GPUs" field:

Select one or more GPU cards in the job submit form when submitting a GPU job via Open OnDemand

Note

Some interactive apps may not have this field. If you encounter this, please let us know.

Via the CLI#

Tip

If you have not yet submitted a job to the HPC cluster, please read our tutorial first.

If you wish to submit a job to node(s) that have GPUs, add the following line to your submit script:

#SBATCH --gres=gpu:[1-4] # <-- Choose a value between 1 and 4 cards

Nodes contain two to four GPU cards. Specify the number of GPU cards per node you wish to use after the --gres=gpus: directive. For example, if your job requires four GPU cards, specify 4:

#SBATCH --gres=gpu:4  # This job will reserve four GPU cards in a single node.

Full Example Submit Script#

The following HPC job will run on a GPU node and print information about the available GPU cards:

#!/bin/bash

#SBATCH --job-name="gpu_test"
#SBATCH --ntasks=1
#SBATCH --mail-type="ALL"
#SBATCH -t 1:00

# Here is the magic line to ensure we're running on a node with GPUs
#SBATCH --gres=gpu:1

# If your owner-based Slurm account has access to GPU nodes, you can use that. 
# For general access users, GPU jobs will run only on the `backfill2` or `gpu_q` accounts (`gpu_q` approval required; see above).
#SBATCH -A backfill2

# Not strictly necessary for this example, but most
# folks will want to load the CUDA module for GPU jobs
module load cuda

# Print out GPU information
/usr/bin/nvidia-smi -L

Your job output should look something like this:

GPU 0: NVIDIA Graphics Device (UUID: GPU-96cbe295-a053-3347-090d-b0adbb013646)
GPU 1: NVIDIA Graphics Device (UUID: GPU-62f15a0a-9c64-6bc4-4a88-f0cdea9a09c1)

Using specific GPU models#

You can view the specific GPU models in the cluster with the sinfo command:

1
2
3
4
5
6
sinfo -o "%50N  %10c  %25f  %40G "| grep gpu

hpc-i36-[1,3,5,7,9,11,13,15]                        28          YEAR2017,intel             gpu:gtx1080i:4                           
hpc-i35-4-1,hpc-i35-5-1,hpc-i35-6-1                 32          YEAR2024,intel             gpu:a4500:2                              
hpc-i35-8                                           32          YEAR2023,intel             gpu:h100:2                               
hpc-i35-1-[1-2],hpc-i35-3-[1-2]                     32          YEAR2023,intel             gpu:a4000:2

Note

GPUs are only available in specific Slurm accounts/queues.

In your submit script, use the following gres line: gpu:a4500:[#]. For example:

1
2
3
# Request two NVIDIA 1080Ti GPUs on a single node
#SBATCH --gres=gpu:gtx1080i:2
#SBATCH --nodes=1

More information#

For more information and examples, refer to our CUDA software documentation