User-Installed Software on Ceres with Conda
Table of Contents
- Installing Software
- Managing Environments
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
Many open-source scientific software packages are available:
- Browse/search all conda packages
The Bioconda channel contains thousands of software packages that are useful for bioinformatics.
- Browse/search available Bioconda software packages
Before using conda or conda-installed software on Ceres, the miniconda environment module (which contains the conda software environment) must be loaded. To load the latest miniconda module available on Ceres:
[user.name@ceres ~]$ module load miniconda
You can see all available versions of miniconda on Ceres with:
[user.name@ceres ~]$ module spider miniconda
(Optional one-time setup for bioconda users) If you plan on installing software primarily from the bioconda channel, before using conda for the first time on Ceres, you may wish to configure conda per the bioconda documentation to search for software packages in the conda-forge, bioconda, and defaults channels (in that order):
[user.name@ceres ~]$ conda config --add channels defaults [user.name@ceres ~]$ conda config --add channels bioconda [user.name@ceres ~]$ conda config --add channels conda-forge
Otherwise, the conda-forge and then bioconda channels must be specified every time software is installed via
conda install or
conda install -c conda-forge -c bioconda SOFTWARE_PACKAGE1 SOFTWARE_PACKAGE2...
Software can be installed into separate environments (directories) that are managed separately. At least one environment must be created before installing software using the Ceres miniconda environment module.
On Ceres, suitable locations for conda environments housing conda packages include:
- Home directory (default; subdirectory of $HOME/.conda/envs/)
NOTE: some Conda packages (with dependencies) can take gigabytes of storage space. Use the Ceres command
my_quota to check the available space in your home directory. Contact the VRSC scinet_vrsc@USDA.GOV if a home directory quota increase is needed.
- A user-specified directory within one’s project storage on the /KEEP file system, e.g., /KEEP/<MY_PROJECT>/<MY_ENVIRONMENT_DIRECTORY>
This environment is then usable by others in the project.
CAUTION: Avoid installing software into the /project file system if possible. It is a BeeGFS parallel file system that is tuned for fewer, larger files, and suffers degraded performance when used as the target for conda packages, which frequently contain many smaller files.
Use an interactive session on a compute node to install software with conda to avoid slowing down the login node for everyone, e.g,
[user.name@ceres ~]$ salloc [user.name@ceres14-compute-60 ~]$ module load miniconda [user.name@ceres14-compute-60 ~]$ source activate my_env (my_env) [user.name@ceres14-compute-60 ~]$ conda install <package_name> ...
Example 1: Installing Trinity into a home directory
Load the latest miniconda module if you haven’t already and create an environment called “trinityenv”:
[user.name@ceres ~]$ module load miniconda [user.name@ceres ~]$ conda create --name trinityenv
Note that the
conda create command used above without the –prefix option will create the environment in your home directory ($HOME/.conda/envs/).
To activate the environment (and update environment variables such as PATH that are required to use software installed into this environment):
[user.name@ceres ~]$ source activate trinityenv (trinityenv) [user.name@ceres ~]$
Do not execute conda init
conda activate, you will be prompted to run
conda initto modify your shell interactive startup script (e.g, ~/.bashrc):
[user.name@ceres ~]$ conda activate CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init <SHELL_NAME>This will have the undesirable side effect of modifying your PATH to include that particular version of miniconda every time you log in (even without loading the miniconda environment variable). If you accidentally run
conda init, edit your shell intereractive startup file (
$HOME/.bashrcfor the bash shell) and remove all lines between
>>> conda initialize >>>
<<< conda initialize <<<
conda deactivatemay still be used safely. We will continue to monitor the GitHub issue that describes the problem and other workarounds.
conda activate after
conda activate is needed for some advanced use cases like nested ("stacked") environments, it can be used after
[user.name@ceres ~]$ ml miniconda [user.name@ceres ~]$ source activate (base) [user.name@ceres ~]$ conda activate samtools (samtools) [user.name@ceres ~]$
Now that you are inside the trinityenv environment, install software into this environment with:
(trinityenv) [user.name@ceres ~]$ conda install <package_name> <package_name> <package_name>
For example, install the Trinity transcriptome assembler and Kallisto RNA-Seq quantification application (an optional dependency that is not included with the default Trinity 2.8.4 installation). Note this step may take a few minutes:
(trinityenv) [user.name@ceres ~]$ conda install trinity kallisto ... Proceed ([y]/n)? y ...
Afterwards, the Trinity and Kallisto executables are in your PATH:
(trinityenv) [user.name@ceres ~]$ type Trinity Trinity is hashed (/home/user.name/.conda/envs/trinityenv/bin/Trinity) (trinityenv) [user.name@ceres ~]$ Trinity --version Trinity version: Trinity-v2.8.4-currently using the latest production release of Trinity. (trinityenv) [user.name@ceres ~]$ type kallisto kallisto is hashed (/home/scinet.username/.conda/envs/trinityenv/bin/kallisto) (trinityenv) [user.name@ceres ~]$ kallisto version kallisto, version 0.44.
To exit the environment:
(trinityenv) [user.name@ceres ~]$ conda deactivate [user.name@ceres ~]$
After deactivating the trinityenv environment, Trinity and kallisto are no longer in your PATH:
[user.name@ceres ~]$ type Trinity -bash: type: Trinity: not found
Example 2: Installing Tensorflow into a /KEEP directory
Load the latest miniconda module if you haven’t already and create an environment in your /KEEP directory by using the option
[user.name@ceres ~]$ module load miniconda [user.name@ceres ~]$ conda create --prefix /KEEP/my_proj/tensorflow ... [user.name@ceres ~]$ source activate /KEEP/my_proj/tensorflow (/KEEP/my_proj/tensorflow) [user.name@ceres ~]$ conda install tensorflow ...
Note: conda first downloads packages into a package cache directory. By default, the package cache is in your home directory ($HOME/.conda/pkgs). If installing a large amount of software that may cause home directory quota to be exceeded, you can configure another directory to be the package cache by adding a pkgs_dirs list to the $HOME/.condarc file (YAML); e.g.:
See the official Conda documentation for managing environments for a complete list of commands.
To list environments that have been created in your home directory:
[user.name@ceres ~]$ conda env list # conda environments: # trinityenv /home/user.name/.conda/envs/trinityenv root * /software/7/apps/miniconda/4.7.12
To list software packages in an environment:
[user.name@ceres ~]$ conda list --name trinityenv # packages in environment at /home/user.name/.conda/envs/trinityenv: # ...
[user.name@ceres ~]$ conda list --prefix /KEEP/my_proj/tensorflow # packages in environment at /KEEP/my_proj/tensorflow: ...
Tip: For reproducibility, a list of all packages/versions in an environment can be exported to an environment file, which can be used to recreate the environment (e.g., by another user, or on another system) or archived with analysis results. This makes it easy for you or anyone else to re-run your analysis on any system and is also a record of the exact software environment you used for your analysis.
To remove an environment in your home directory:
[user.name@ceres ~]$ conda env remove --name trinityenv
To remove an environment in your /KEEP directory:
rm -rf /KEEP/my_proj/tensorflow
To remove packages not used by any environment, as well as tarballs downloaded into the conda package cache ($HOME/.conda/pkgs):
conda clean --all