A program for aligning multiple sets of genetic sequence data
MAFFT requires an environment module
In order to use MAFFT, you must first load the appropriate environment module:
module load gnu
MAFFT (Multiple Alignment using Fast Fourier Transform) is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this, some better suited to smaller sequence reads (such as L-INS-i) while others are better suited to larger sequence reads (such as FFT-NS-2).
Using MAFFT on RCC Resources#
Serially Running MAFFT on the HPC#
In order to begin running MAFFT, use the format:
-[OPTS]is a list of command line options you wish to run your job with, while
OUTPUTare the required input and output files.
MAFFT also contains a number of other related programs including linsi, ginsi, and mafft-profile. Detailed information on these can be found in the official MAFFT manual.
As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using the following commands:
TEST with the name of your sequence file and
OUTPUT with the name of your output file.
Running MAFFT in Parallel on the HPC#
If you wish to run MAFFT in Parallel on RCC machines, you will need to load the GNU OpenMPI module using the
load gnu openmpi. This will give you the ability to use the
srun command for your MAFFT jobs.
After loading GNU OpenMPI, You can then run MAFFT by writing a Slurm script, which must be saved as a file with the
.sh suffix. Below is an example script using
TEST.fa as the FASTA data file, outputting to the file
Then submit your script using the following command, replacing
YOURSCRIPT with the name of your script file: