Skip to content

Submitting GPU jobs

The HPC supports GPU jobs. The HPC includes an increasing number of compute nodes equipped with GPU hardware.

GPU resources available on the HPC#

GPU resources on the HPC are available in certain access Slurm accounts (partitions):

GPU Model Slurm accounts
NVIDIA GeForce GTX 1080 Ti backfill2 (4 hour time limit; see below)
NVIDIA RTX A4000 Owner accounts only
NVIDIA RTX A4500 backfill2 (4 hour time limit; see below) and owner accounts
NVIDIA H100 Tensor Core GPU Limited access for pilot programs; Request access

In addition, if your department, lab, or group has purchased GPU resources, they will be available on your owner-based Slurm account.

Free GPU jobs longer than four hours#

If you want to run jobs that are longer than four hours in our general access queues, we are accepting requests on a case-by-case basis. Contact us to request access.

If your research group has purchased dedicated GPU resources on our system, then there is no need to request access. If you submit your job to your owner-based queue, you will be able to run GPU jobs for longer than four hours.

Submitting GPU Jobs#

Via Open OnDemand#

You can submit graphical jobs via Open OnDemand to run on GPU nodes. Note that this may increase your job wait time in the queue. Follow the instructions for submitting a job, and specify 1 or more GPUs in the "GPUs" field:

Select one or more GPU cards in the job submit form when submitting a GPU job via Open OnDemand

Note

Some interactive apps may not have this field. If you encounter this, please let us know.

Via the CLI#

Tip

If you have not yet submitted a job to the HPC cluster, please read our tutorial first.

If you wish to submit a job to node(s) that have GPUs, add the following line to your submit script:

#SBATCH --gres=gpu:[1-4] # <-- Choose a value between 1 and 4 cards

Nodes contain two to four GPU cards. Specify the number of GPU cards per node you wish to use after the --gres=gpus: directive. For example, if your job requires four GPU cards, specify 4:

#SBATCH --gres=gpu:4  # This job will reserve four GPU cards in a single node.

Full Example Submit Script#

The following HPC job will run on a GPU node and print information about the available GPU cards:

#!/bin/bash

#SBATCH --job-name="gpu_test"
#SBATCH --ntasks=1
#SBATCH --mail-type="ALL"
#SBATCH -t 1:00

# Here is the magic line to ensure we're running on a node with GPUs
#SBATCH --gres=gpu:1

# If your owner-based Slurm account has access to GPU nodes, you can use that. 
# For general access users, GPU jobs will run only on the list of Slurm accounts indiciated above.
#SBATCH -A backfill2

# Not strictly necessary for this example, but most
# folks will want to load the CUDA module for GPU jobs
module load cuda

# Print out GPU information
/usr/bin/nvidia-smi -L

Your job output should look something like this:

GPU 0: NVIDIA Graphics Device (UUID: GPU-96cbe295-a053-3347-090d-b0adbb013646)
GPU 1: NVIDIA Graphics Device (UUID: GPU-62f15a0a-9c64-6bc4-4a88-f0cdea9a09c1)

Using specific GPU models#

You can view the specific GPU models in the cluster with the sinfo command:

1
2
3
4
5
6
sinfo -o "%50N  %10c  %25f  %40G "| grep gpu

hpc-i36-[1,3,5,7,9,11,13,15]                        28          YEAR2017,intel             gpu:gtx1080i:4                           
hpc-i35-4-1,hpc-i35-5-1,hpc-i35-6-1                 32          YEAR2024,intel             gpu:a4500:2                              
hpc-i35-8                                           32          YEAR2023,intel             gpu:h100:2                               
hpc-i35-1-[1-2],hpc-i35-3-[1-2]                     32          YEAR2023,intel             gpu:a4000:2

Note

GPUs are only available in specific Slurm accounts/queues.

In your submit script, use the following gres line: gpu:a4500:[#]. For example:

1
2
3
# Request two NVIDIA 1080Ti GPUs on a single node
#SBATCH --gres=gpu:gtx1080i:2
#SBATCH --nodes=1

More information#

For more information and examples, refer to our CUDA software documentation