Use the navgation options or select one of the guides below
All users should have received their login credentials in an email. If you have not, please email the Virtual Research Support Core at scinet_vrsc@USDA.GOV.
If you have not recieved a LincPass or YubiKey, please see the Deprecated Login Procedures page for instructions to access the HPC.
If you prefer a graphical user interface, these are some of the options you have to access SCINet.
The SCINet VPN will no longer be offered as a service.
If you have been using the SCINet VPN, please explore OOD for your use cases. Open OnDemand allows for full access to a Linux desktop, all SCINet apps, and cluster controls.
If you do believe the SCINet VPN is essential to your work, please contact firstname.lastname@example.org ASAP so we can help find a solution for you and your critical work isn’t delayed.
Note this is not about the USDA/ARS VPN, just that the SCINet VPN is being shut down.
The software discussed and shown in these user guides is largely open source, can run on a desktop, HPC, or cloud environment, and can be installed with software management systems that support reproducibility (such as Conda, Singularity, and Docker). Below is a quick overview of some of the software, hardware, and confusing nomenclature that is used throughout this site.
This guide lists differences between the Atlas and Ceres clusters to ease transition from one cluster to another.
Add the following sentence as an acknowledgment for using CERES as a resource in your manuscripts meant for publication:
“This research used resources provided by the SCINet project of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D.”
In addition to the Ceres and Atlas clusters, there are external computing resources available to the SCINet community, including Amazon Web Services, XSEDE, and the Open Science Grid. These resources may be of interest to SCINet users that require:
- very large jobs (either numerous small jobs, or many nodes in parallel)
- special computing hardware requirements (e.g., GPUs, Xeon Phi, extremely-large memory)
- software that isn’t supported on Ceres (e.g., web apps, relational databases, VMs, Hadoop, Spark, certain commercial software)
This document describes recommended procedures (SOP) for managing data on ARS HPC and storage infrastructure.
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The
my_quotas command provides the same output:
This document provides detailed information about the storage options provided by SCINet and how to use them. For a simpler overview of suggested procedures for managing data on SCINet, please see Managing Data on ARS HPC and Storage Infrastructure.
There are multiple places to store data on the Ceres and Atlas clusters that all serve different purposes.
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
Users will run their applications on the cluster in either interactive mode or in batch mode. Interactive mode (
srun command) is familiar to anyone using the command line: the user specifies an application by name and various arguments, hits Enter, and the application runs. However, in interactive mode on a cluster the user is automatically switched from using a login node to using a compute node. This keeps all the intense computation off the login nodes, so that login nodes can have all the resources necessary for managing the cluster. You should always use interactive mode when you are running your application but not using batch mode. Please do not run your applications on the login nodes, use the interactive mode.
Compute jobs are run on functional groups of nodes called partitions or queues. Each different partition has different capabilities (e.g. regular memory versus high memory nodes) and resource restrictions (e.g. time limits on jobs). Nodes may appear in several partitions.
To provide better Ceres usage report all Ceres users have been assigned Slurm accounts based on their project groups. If you don’t have a project, then your default and only Slurm account is sandbox.
All compute nodes have 1.5 TB of fast local temporary data file storage space supported by SSDs. This local scratch space is significantly faster and supports more input/output operations per second (IOPS) than the mounted filesystems on which the home and project directories reside.
This guide includes information about command-line software, as well as information on graphical software such as Galaxy, CLC, Geneious, RStudio, and Juptyer.
Conda is a software package manager for data science that allows unprivileged (non-administrative) Linux or MacOS users to search, fetch, install, upgrade, use, and manage supported open-source software packages and programming languages/libraries/environments (primarily Python and R, but also others such as Perl, Java, and Julia) in a directory they have write access to. Conda allows SCINet users to create reproducible scientific software environments (including outside of Ceres) without requiring the submission of a SCINet software request form for new software, or contacting the VRSC to upgrade existing software.
Open OnDemand is an intuitive, innovative, and interactive interface to remote computing resources. The key benefit for SCINet users is that they can use any web browser, including browsers on a mobile phone, to access Ceres.
There are several interactive apps that can be run in Open OnDemand including Jupyter, RStudio Server, Geneious, CLC Genomics Workbench, and more. The desktop app allows a user to run any GUI software.
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
Some software packages may not be available for the version of Linux running on the HPC cluster. In this case, users may want to run containers. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so users don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use.
Singularity https://sylabs.io/ is an application for running containers on an HPC cluster. Containers are self-contained application execution environments that contain all necessary software to run an application or workflow, so you don’t need to worry about installing all the dependencies. There are many pre-built container images for scientific applications available for download and use, see section Container Images.
This document assumes that a licensed copy of CLC Genomics WorkBench 22 is installed locally and available to the user.
Jupyter is an Integrated Development Environment (IDE) that provides an interactive and collaborative environment for scientific computing. This interactive coding environment allows for immediate execution and visualization of code, facilitating on-the-fly data analysis and visualization. It supports over 40 programming languages (including Python, R, Julia, Java, and Scala) and seamlessly integrates with popular data science libraries.
The popular R, Perl and Python languages have many packages/modules available. Some of the packages are installed on Ceres and are available with the r/perl/python_2/python_3 modules. To see the list of installed packages, visit the Preinstalled Software List page or use
module help <module_name> command. If users need packages that are not available, they can either request VRSC to add packages, or they can download and install packages in their home/project directories. We recommend installing packages in the project directories since collaborators on the same project most probably would need same packages. In addition, home quotas are much lower than quotas for project directories.
FAQ about AWS resources available to ARS scientists.
The aws-saml tool can be used in conjunction with the Shibboleth SAML identity provider to retrieve time-limited API keys suitable for commandline use. It interactively prompts you for your password, and if you have multiple roles available, you are prompted if not otherwise specified on the command line. The resulting credentials are normally stored in your standard AWS credential file, but a command line flag can be provided tohave the credentials output to standard output in Bash format for scripting. These credentials normally expire after one hour; by providing therefresh flag to the tool, it will fork into the background and keep the credentials refreshed as long as your login cookie remains valid.