Conda and Anaconda
Conda is a Python package manager, virtual environment manager, and more.
Conda is a package manager, similar to pip. It helps you take care of your different packages by handling installing, updating and removing them. The advantage over pip is that it automatically creates isolated environments for different projects, and it can install data science libraries that are not written in Python (e.g., "R", C, etc.). It is the most popular package manager for data science.
Anaconda is a "batteries included" distribution of Python that includes over 150 data science packages. It uses Conda as its package manager.
Conda vs Pip#
Both Conda and Pip are package managers written in Python. The following table shows the differences between the two1:
conda | pip | |
---|---|---|
manages | binaries | wheel or source |
can require compilers | no | yes |
package types | any | Python-only |
creates isolated environments | yes, built-in | no, requres virtualenv or venv |
dependency checks | yes | no |
package sources | Anaconda repo or cloud | PyPI |
recommended for | data science | Python-only code |
Setting up a Conda Environment#
Initial setup#
To use Conda, you need to configure your shell environment using the conda init
command. You only need to do this once.
Thenceforth, when you log in to the HPC, your shell will be configured for Conda.
From a login node:
- If you need a specific version of anaconda, use
anaconda/VERSION
. To see a list of versions available on the cluster, runmodule avail anaconda
. - If you are using a shell other than bash (e.g.,
tcsh
,zsh
, orfish
, substitute that here). - If you are using a shell other than bash, you will need to source your shell initialization file; if you do not know what that file name is, you can log out and log back in to the HPC.
Note
The conda initalization may cause your shell to take a second or two longer to load.
If you no longer need to use the conda package manager, you can edit your shell initialization script
(~/.bashrc
in the BASH environment), and remove the lines between and including
the # >>> conda initialize >>>
and # <<< conda initialize <<<
.
Managing Conda Environments#
You can create as many Conda environments as you wish. Each environment is isolated to a single directory, and you can have as many environments as you need:
For example, to create a Conda environment named my_conda_app
:
Read and accept the prompts. When the script completes, you will see the following:
Run the command conda activate my_conda_app
:
Work in your Conda environment:
To de-activate your Conda environment:
List your Conda environments:
Delete a Conda environment:
Anaconda#
We provide several pre-installed Anaconda environments globally on the HPC. To load the default Anaconda version, load the environment module:
If you need a specific version, you can use the module avail anaconda
command, which will show the available
versions on the HPC:
You can ensure that you activated Anaconda successfully by checking the Python path using the which
command:
Discovering Anaconda Packages#
The list of packages included with Anaconda differs based on which module you use. The best way to determine what packages
are installed in Anaconda is to run the conda list
command.
Installing Anaconda in your home directory#
If you want to use the latest version of Anaconda, you can create a conda environment in your home directory and install Anaconda in it.
-
Comparison table courtesy of Kumar Brar ↩