Use the navgation options or select one of the guides below
Access Guides
How to access SCINet's web-based user interfaces.
- Open OnDemand
- Galaxy
- Globus (for file transfers)
This guide gives step-by-step instructions for accessing SCINet systems via SSH, which allows you to connect to SCINet systems via the command-line terminal.
Computing Resources Guides
This guide lists differences between the Atlas and Ceres clusters to ease transition from one cluster to another.
Add the following sentence as an acknowledgment for using CERES as a resource in your manuscripts meant for publication:
“This research used resources provided by the SCINet project and/or the AI Center of Excellence of the USDA Agricultural Research Service, ARS project numbers 0201-88888-003-000D and 0201-88888-002-000D.”
In addition to the Ceres and Atlas clusters, there are external computing resources available to the SCINet community, including Amazon Web Services, XSEDE, and the Open Science Grid. These resources may be of interest to SCINet users that require:
- very large jobs (either numerous small jobs, or many nodes in parallel)
- special computing hardware requirements (e.g., GPUs, Xeon Phi, extremely-large memory)
- software that isn’t supported on Ceres (e.g., web apps, relational databases, VMs, Hadoop, Spark, certain commercial software)
The software discussed and shown in these user guides is largely open source, can run on a desktop, HPC, or cloud environment, and can be installed with software management systems that support reproducibility (such as Conda, Singularity, and Docker). Below is a quick overview of some of the software, hardware, and confusing nomenclature that is used throughout this site.
Data Management Guides
This document describes recommended procedures (SOP) for managing data on ARS HPC and storage infrastructure.
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The my_quotas
command provides the same output:
$ my_quotas
This document provides detailed information about the storage options provided by SCINet and how to use them. For a simpler overview of suggested procedures for managing data on SCINet, please see Managing Data on ARS HPC and Storage Infrastructure.
There are multiple places to store data on the Ceres and Atlas clusters that all serve different purposes.
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
Rclone is already installed on the DTNS and all of the compute nodes. Please do not use rclone from the headnode. Attempting to do so will remind you to use the others.
The rclone home page is https://rclone.org.
Cluster Use Guides
Compute jobs are run on functional groups of nodes called partitions or queues. Each different partition has different capabilities (e.g. regular memory versus high memory nodes) and resource restrictions (e.g. time limits on jobs). Nodes may appear in several partitions.
Open OnDemand is an intuitive, innovative, and interactive interface to remote computing resources. The key benefit for SCINet users is that they can use any web browser, including browsers on a mobile phone, to access Ceres.
There are several interactive apps that can be run in Open OnDemand including Jupyter, RStudio Server, Geneious, CLC Genomics Workbench, and more. The desktop app allows a user to run any GUI software.
If you are using Atlas Open OnDemand, visit the Atlas Open OnDemand Guide for more information.
To access Open OnDemand on the Ceres cluster, go to Ceres OpenOndemand
Ceres uses Simple Linux Utility for Resource Management (SLURM) to submit interactive and batch jobs to the compute nodes. Requested resources can be specified either within the job script or using options with the salloc
, srun
, or sbatch
commands.
All compute nodes have 1.5 TB of fast local temporary data file storage space supported by SSDs. This local scratch space is significantly faster and supports more input/output operations per second (IOPS) than the mounted filesystems on which the home and project directories reside.
Development Environment Guides
Jupyter is an Integrated Development Environment (IDE) that provides an interactive and collaborative environment for scientific computing. This interactive coding environment allows for immediate execution and visualization of code, facilitating on-the-fly data analysis and visualization. It supports over 40 programming languages (including Python, R, Julia, Java, and Scala) and seamlessly integrates with popular data science libraries.
Visual Studio Code (VS Code) is a source-code editor.
Software Installation and Access Guides
The popular R, Perl and Python languages have many packages/modules available. Some of the packages are installed on Ceres and are available with the r/perl/python_2/python_3 modules. To see the list of installed packages, visit the Preinstalled Software List page or use module help <module_name>
command. If users need packages that are not available, they can either request VRSC to add packages, or they can download and install packages in their home/project directories. We recommend installing packages in the project directories since collaborators on the same project most probably would need same packages. In addition, home quotas are much lower than quotas for project directories.
This guide includes information about command-line software, as well as information on graphical software such as Galaxy, CLC, Geneious, RStudio, and Juptyer.
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
Some software packages may not be available for the version of Linux running on the HPC cluster. In this case, users may want to run containers. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so users don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use.
Apptainer (formerly Singularity) https://apptainer.org is an application for running containers on an HPC cluster. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so you don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use, see section Container Images.
Application Guides
AWS Guides
FAQ about AWS resources available to ARS scientists.
The aws-saml tool can be used in conjunction with the Shibboleth SAML identity provider to retrieve time-limited API keys suitable for commandline use. It interactively prompts you for your password, and if you have multiple roles available, you are prompted if not otherwise specified on the command line. The resulting credentials are normally stored in your standard AWS credential file, but a command line flag can be provided tohave the credentials output to standard output in Bash format for scripting. These credentials normally expire after one hour; by providing therefresh flag to the tool, it will fork into the background and keep the credentials refreshed as long as your login cookie remains valid.