MAFFT
A program for aligning multiple sets of genetic sequence data
MAFFT requires an environment module
In order to use MAFFT, you must first load the appropriate environment module:
module load gnu
MAFFT (Multiple Alignment using Fast Fourier Transform) is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this, some better suited to smaller sequence reads (such as L-INS-i) while others are better suited to larger sequence reads (such as FFT-NS-2).
Using MAFFT on RCC Resources#
Serially Running MAFFT on the HPC#
In order to begin running MAFFT, use the format:
-[OPTS]
is a list of command line options you wish to run your job with, while INPUT
and OUTPUT
are the
required input and output files.
MAFFT also contains a number of other related programs including linsi, ginsi, and mafft-profile. Detailed information on these can be found in the official MAFFT manual.
As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using the following commands:
Replace TEST
with the name of your sequence file and OUTPUT
with the name of your output file.
Running MAFFT in Parallel on the HPC#
Note
If you wish to run MAFFT in Parallel on RCC machines, you will need to load the GNU OpenMPI module using the
command module load gnu openmpi
. This will give you the ability to use the srun
command for your MAFFT jobs.
After loading GNU OpenMPI, You can then run MAFFT by writing a Slurm script, which must be saved as a file with the
.sh suffix. Below is an example script using TEST.fa
as the FASTA data file, outputting to the file OUTPUT
.
Then submit your script using the following command, replacing YOURSCRIPT
with the name of your script file: