SCINet Software, Tools, and Storage
Use the navgation options or select one of the guides below to find SCINet tools
Software
This guide includes information about command-line software, as well as information on graphical software such as Galaxy, CLC, Geneious, RStudio, and Juptyer.
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
Some software packages may not be available for the version of Linux running on the HPC cluster. In this case, users may want to run containers. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so users don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use.
Singularity https://sylabs.io/ is an application for running containers on an HPC cluster. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so you don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use, see section Container Images.
Analysis
This document assumes that a licensed copy of CLC Genomics WorkBench 22 is installed locally and available to the user.
The popular R, Perl and Python languages have many packages/modules available. Some of the packages are installed on Ceres and are available with the r/perl/python_2/python_3 modules. To see the list of installed packages, visit the Preinstalled Software List page or use module help <module_name>
command. If users need packages that are not available, they can either request VRSC to add packages, or they can download and install packages in their home/project directories. We recommend installing packages in the project directories since collaborators on the same project most probably would need same packages. In addition, home quotas are much lower than quotas for project directories.
Data
This document describes recommended procedures (SOP) for managing data on ARS HPC and storage infrastructure.
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The my_quotas
command provides the same output:
$ my_quotas
This document provides detailed information about the storage options provided by SCINet and how to use them. For a simpler overview of suggested procedures for managing data on SCINet, please see Managing Data on ARS HPC and Storage Infrastructure.
There are multiple places to store data on the Ceres and Atlas clusters that all serve different purposes.
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
Rclone is already installed on the DTNS and all of the compute nodes. Please do not use rclone from the headnode. Attempting to do so will remind you to use the others.
The rclone home page is https://rclone.org.