Skip to content

MAFFT

A program for aligning multiple sets of genetic sequence data


MAFFT requires an environment module

In order to use MAFFT, you must first load the appropriate environment module:

module load gnu

MAFFT (Multiple Alignment using Fast Fourier Transform) is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this, some better suited to smaller sequence reads (such as L-INS-i) while others are better suited to larger sequence reads (such as FFT-NS-2).

Using MAFFT on RCC Resources#

Serially Running MAFFT on the HPC#

In order to begin running MAFFT, use the format:

$ mafft -[OPTS] INPUT > OUTPUT
Where -[OPTS] is a list of command line options you wish to run your job with, while INPUT and OUTPUT are the required input and output files.

MAFFT also contains a number of other related programs including linsi, ginsi, and mafft-profile. Detailed information on these can be found in the official MAFFT manual.

As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using the following commands:

$ module load mafft
$ mafft TEST.fa > OUTPUT

Replace TEST with the name of your sequence file and OUTPUT with the name of your output file.

Running MAFFT in Parallel on the HPC#

Note

If you wish to run MAFFT in Parallel on RCC machines, you will need to load the GNU OpenMPI module using the command module load gnu openmpi. This will give you the ability to use the srun command for your MAFFT jobs.

After loading GNU OpenMPI, You can then run MAFFT by writing a Slurm script, which must be saved as a file with the .sh suffix. Below is an example script using TEST.fa as the FASTA data file, outputting to the file OUTPUT.

#! /bin/bash
#SBATCH -J MAFFT_Test
#SBATCH -p genacc_q
#SBATCH -n 4
#SBATCH -t 00:10:00
#SBATCH --mail-type=ALL

module load gnu openmpi

srun mafft TEST.fa > OUTPUT

Then submit your script using the following command, replacing YOURSCRIPT with the name of your script file:

$ sbatch YOURSCRIPT.sh