Searching Biological Sequence Data for Regions of Similarity
FASTA requires an environment module
In order to use FASTA, you must first load the appropriate environment module:
module load gnu
FASTA is a set of bioinformatics programs designed to take in biological sequence data consisting of either DNA or protein sequences and then search through them to find regions of similarity. The programs can find both locally similar regions or globally similar regions. RCC also has a parallel version available which uses MPI.
Using FASTA on RCC Resources#
There are a number of programs included in the FASTA software package. The
gnu module needs to be loaded to run these
programs unless you want to run in parallel. To run in parallel, one of the available MPI implementations must be loaded
such as GNU OpenMPI.
Refer to the official homepage for a complete list of the programs included in the FASTA package. These programs
fasta36 which does sequence comparison. The following is an example run of
fasta36 on HPC:
In order to run the program in parallel on RCC systems, you can either do a call to
mpirun or submit it as a
Slurm job script. A sample job script for the
fasta36 program is below:
Then submit your script using the following command, replacing
YOURSCRIPT with the name of your script file:
Note that the above examples can be applied to any of the other FASTA programs.