Skip to content

R (statistical software)


R (statistical software) requires an environment module

In order to use R (statistical software), you must first load the appropriate environment module:

module load R

The "R" Tool is an open-source, popular, and fully-featured statistical application and programming platform. We have multiple versions of R installed on HPC. To see the full list, connect to any login node, and run the following command:

$ module avail R

You will see output similar to the following:

1
2
3
4
5
6
---------------------------- /opt/modulefiles/core -----------------------------
   R/3.5.2        R/4.2.0    kmergenie             openrefine    webproxy
   R/4.0.0        R/4.2.2    metashape-pro         relion
   R/4.1.0 (D)    gradle     metashape-standard    spark

-------------------- /usr/share/lmod/lmod/modulefiles/Core ---------------------

To load a version of R besides the default, append the version number to the module name:

1
2
3
4
5
# Load "R" environment module v4.2.2
$ module load R/4.2.2

# Run "R"
$ R

Submit R jobs to the HPC#

To use "R" on HPC, append CMD BATCH to the executable line in your submit script. For example:

#!/bin/bash

#SBATCH -n 1                    # We'll request a single process for this example
#SBATCH --job-name="My R Task"  # Job name (optional)
#SBATCH -A backfill             # We will submit to the backfill Slurm account
#SBATCH -t 1:00:00              # Instruct the scheduler to reserve one hour of walltime
#SBATCH --mail-type=ALL         # Send emails on all job state change events (start, finish, etc)

module load R/4.2.2

R CMD BATCH yourRscript         # yourRscript is a text file where you have saved R command to run

Install R packages in your home directory#

If you need a package in R that is not included with the base R installation, you can install it yourself in your home directory. This provides the most flexibility, since you can install whichever R packages you need without administrative privileges on our system. Here's how to do it:

# Substitute 'R' with a specific version of R if you need a version other than that provided by the default module
$ module load R 

# Invoke the R prompt
$ R

# Install package
> install.packages("PACKAGE_NAME_HERE")

# Type 'yes' when you see the following message
Installing package into ‘/opt/rcc/R/R-4.2.2/share/R/library’
(as ‘lib’ is unspecified)
Warning in install.packages("parallel") :
  'lib = "/opt/rcc/R/R-4.2.2/share/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes

# Select a CRAN mirror from the list (any will do, but we recommend choosing one based in the US)
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors 

 1: 0-Cloud [https]
 2: Australia (Canberra) [https]
 3: Australia (Melbourne 1) [https]
 ...

Parallel computing with R#

There are several powerful parallel processing libraries for R. This capability is vital for leveraging the full power of the HPC for your research. Some of the more popular libraries are as follows:

Submitting R jobs to the cluster#

More information about job submissions is available on this website.

Single node, multicore job#

#!/bin/bash
#SBATCH -J myRjob
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 10:00:00

module load R/4.2.2

R CMD BATCH myRjob.R

Multiple node, multicore job#

#!/bin/bash

#SBATCH -J myRjob
#SBATCH -n 32
#SBATCH -p genacc_q
#SBATCH -t 10:00:00

module load R/4.2.2

R CMD BATCH myRjob.R