Use the navgation options or select one of the guides below to find SCINet tools
Software
This guide includes information about command-line software, as well as information on graphical software such as Galaxy, CLC, Geneious, RStudio, and Juptyer.
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
Open OnDemand is an intuitive, innovative, and interactive interface to remote computing resources. The key benefit for SCINet users is that they can use any web browser, including browsers on a mobile phone, to access Ceres.
There are several interactive apps that can be run in Open OnDemand including Jupyter, RStudio Server, Geneious, CLC Genomics Workbench, and more. The desktop app allows a user to run any GUI software.
If you are using Atlas Open OnDemand, visit the Atlas Open OnDemand Guide for more information.
To access Open OnDemand on the Ceres cluster, go to Ceres OpenOndemand
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
Some software packages may not be available for the version of Linux running on the HPC cluster. In this case, users may want to run containers. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so users don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use.
Singularity https://sylabs.io/ is an application for running containers on an HPC cluster. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so you don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use, see section Container Images.
Analysis
Jupyter is an Integrated Development Environment (IDE) that provides an interactive and collaborative environment for scientific computing. This interactive coding environment allows for immediate execution and visualization of code, facilitating on-the-fly data analysis and visualization. It supports over 40 programming languages (including Python, R, Julia, Java, and Scala) and seamlessly integrates with popular data science libraries.
The popular R, Perl and Python languages have many packages/modules available. Some of the packages are installed on Ceres and are available with the r/perl/python_2/python_3 modules. To see the list of installed packages, visit the Preinstalled Software List page or use module help <module_name>
command. If users need packages that are not available, they can either request VRSC to add packages, or they can download and install packages in their home/project directories. We recommend installing packages in the project directories since collaborators on the same project most probably would need same packages. In addition, home quotas are much lower than quotas for project directories.
Data
This document describes recommended procedures (SOP) for managing data on ARS HPC and storage infrastructure.
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The my_quotas
command provides the same output:
$ my_quotas
This document provides detailed information about the storage options provided by SCINet and how to use them. For a simpler overview of suggested procedures for managing data on SCINet, please see Managing Data on ARS HPC and Storage Infrastructure.
There are multiple places to store data on the Ceres and Atlas clusters that all serve different purposes.
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
Rclone is already installed on the DTNS and all of the compute nodes. Please do not use rclone from the headnode. Attempting to do so will remind you to use the others.
The rclone home page is https://rclone.org.