SCINet Guides List
Use the navgation options or select one of the guides below
Access Guides
All users should have received their login credentials in an email. If you have not, please email the Virtual Research Support Core at scinet_vrsc@USDA.GOV.
Before accessing various SCINet resources, new users need to ssh either to Ceres or Atlas cluster and change the temporary password. Note that home directories on Atlas are not created right away, so it is recommended to wait a day after receiving email with the credentials before logging to Atlas cluster.
For security, SCINet requires multifactor authentication. Use these instructions to get set up.
Software is usually required to access the VPN. We recommend Cisco AnyConnect if that is availbible, and OpenConnect if it is not.
Open OnDemand is an intuitive, innovative, and interactive interface to remote computing resources. The key benefit for SCINet users is that they can use any web browser, including browsers on a mobile phone, to access Ceres.
The software discussed and shown in these user guides is largely open source, can run on a desktop, HPC, or cloud environment, and can be installed with software management systems that support reproducibility (such as Conda, Singularity, and Docker). Below is a quick overview of some of the software, hardware, and confusing nomenclature that is used throughout this site.
Resources Guides
This guide lists differences between the Atlas and Ceres clusters to ease transition from one cluster to another.
Add the following sentence as an acknowledgment for using CERES as a resource in your manuscripts meant for publication:
“This research used resources provided by the SCINet project of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D.”
In addition to the Ceres and Atlas clusters, there are external computing resources available to the SCINet community, including Amazon Web Services, XSEDE, and the Open Science Grid. These resources may be of interest to SCINet users that require:
- very large jobs (either numerous small jobs, or many nodes in parallel)
- special computing hardware requirements (e.g., GPUs, Xeon Phi, extremely-large memory)
- software that isn’t supported on Ceres (e.g., web apps, relational databases, VMs, Hadoop, Spark, certain commercial software)
Data Guides
This document describes recommended procedures (SOP) for managing data on ARS HPC and storage infrastructure.
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The my_quotas
command provides the same output:
$ my_quotas
This document provides detailed information about the storage options provided by SCINet and how to use them. For a simpler overview of suggested procedures for managing data on SCINet, please see Managing Data on ARS HPC and Storage Infrastructure.
There are multiple places to store data on the Ceres and Atlas clusters that all serve different purposes.
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
Rclone is already installed on the DTNS and all of the compute nodes. Please do not use rclone from the headnode. Attempting to do so will remind you to use the others.
The rclone home page is https://rclone.org.
Software Guides
This guide includes information about command-line software, as well as information on graphical software such as Galaxy, CLC, Geneious, RStudio, and Juptyer.
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
Some software packages may not be available for the version of Linux running on the HPC cluster. In this case, users may want to run containers. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so users don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use.
Singularity https://sylabs.io/ is an application for running containers on an HPC cluster. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so you don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use, see section Container Images.
Analysis Guides
This document assumes that a licensed copy of CLC Genomics WorkBench 22 is installed locally and available to the user.
The popular R, Perl and Python languages have many packages/modules available. Some of the packages are installed on Ceres and are available with the r/perl/python_2/python_3 modules. To see the list of installed packages, visit the Preinstalled Software List page or use module help <module_name>
command. If users need packages that are not available, they can either request VRSC to add packages, or they can download and install packages in their home/project directories. We recommend installing packages in the project directories since collaborators on the same project most probably would need same packages. In addition, home quotas are much lower than quotas for project directories.
AWS Guides
Guide to creating an AWS resource via the service catalog.
FAQ about AWS resources available to ARS scientists.
The aws-saml tool can be used in conjunction with the Shibboleth SAML identity provider to retrieve time-limited API keys suitable for commandline use. It interactively prompts you for your password, and if you have multiple roles available, you are prompted if not otherwise specified on the command line. The resulting credentials are normally stored in your standard AWS credential file, but a command line flag can be provided tohave the credentials output to standard output in Bash format for scripting. These credentials normally expire after one hour; by providing therefresh flag to the tool, it will fork into the background and keep the credentials refreshed as long as your login cookie remains valid.