Skip to main content

SCINet Ceres

Onboarding Videos

Users who are new to the HPC environment may benefit from the following Ceres onboarding video which covers much of the material contained in this guide plus some Unix basics.

Ceres Onboarding (Intro to SCINet Ceres HPC) (length 42:13)

Note: /KEEP storage discussed in the video at 16:20 is no longer available. Instead data that cannot be easily reproduced should be manually backed up to Juno. The instructional video at https://www.youtube.com/watch?v=I3lnsCAfx3Q demonstrates how to transfer files between local computer, Ceres, Atlas and Juno using Globus.

The video includes:

  • logging on to Ceres
  • changing your password
  • home and project directories
  • data transfer to/from SCINet clusters
  • basic SLURM job scheduler commands
  • computing in interactive mode with salloc
  • accessing Ceres software modules
  • computing in batch mode with a batch script

Technical Overview

Ceres is the dedicated high performance computing (HPC) infrastructure for ARS researchers on ARS SCINet. Ceres is designed to enable large-scale computing and large-scale storage. Currently, the following compute nodes are available on the Ceres cluster.

120 regular compute nodes, each having:

  • 72 logical cores on 2 x 18 core Intel Xeon Processors (6140 2.30GHz 25MB Cache or 6240 2.60GHz 25MB Cache) with hyper-threading turned ON
  • 384GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

76 regular compute nodes, each having:

  • 96 logical cores on 2 x 24 core Intel Xeon Processors (6240R 2.40GHz 36MB Cache) with hyper-threading turned ON
  • 384GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

4 large memory nodes, each having:

  • 80 logical cores on 2 x 20 core Intel Xeon Processors (6148 2.40GHz 27.5MB Cache or 6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
  • 768GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

11 large memory nodes, each having:

  • 80 logical cores on 2 x 20 core Intel Xeon Processors (6148 2.40GHz 27.5MB Cache or 6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
  • 1,536GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

11 large memory nodes, each having:

  • 96 logical cores on 2 x 24 core Intel Xeon Processors (6248R 3GHz 27.5MB Cache or 6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
  • 1,536GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

1 GPU node that has:

  • 72 logical cores on 2 x 18 core Intel Xeon Processors (6140 2.30GHz 25MB Cache) with hyper-threading turned ON
  • 2 Tesla V100
  • 384GB DDR3 ECC Memory
  • 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
  • 1.5TB SSD used for temporary local storage
  • Mellanox ConnectX®­3 VPI FDR InfiniBand

In addition there are a specialized data transfer node and several service nodes.

In aggregate, there are more than 9000 compute cores (18000 logical cores) with 110 terabytes (TB) of total RAM, 500TB of total local storage, and 3.7 petabyte (PB) of shared storage.

Shared storage consists of 2.3PB high-performance Lustre space, 1.4PB high-performance BeeGFS space and 300TB of backed-up ZFS space.

System Configuration

Since most HPC compute nodes are dedicated to running HPC cluster jobs, direct access to the nodes is discouraged. The established HPC best practice is to provide login nodes. Users access a login node to submit jobs to the cluster’s resource manager (SLURM), and access other cluster console functions. All nodes run on Linux CentOS 7.8.

Software Environment

Domain Software
Operating System CentOS
Scheduler SLURM
Software For the full list of installed scientific software refer to the Preinstalled Software List page or issue the module spider command on the Ceres login node.
Modeling BeoPEST, EPIC, KINEROS2, MED-FOES, SWAT, h2o
Compilers GNU (C, C++, Fortran), clang, llvm, Intel Parallel Studio
Languages Java 6, Java 7, Java 8, Python, Python 3, R, Perl 5, Julia, Node
Tools and Libraries tmux, Eigen, Boost, GDAL, HDF5, NetCDF, TBB, Metis, PROJ4, OpenBLAS, jemalloc
MPI libraries MPICH, OpenMPI
Profiling and debugging PAPI

For more information on available software and software installs refer to our guides on Modules, Singularity Containers and Installing R, Python, and Perl Packages.

Additional Guides for Ceres:

  • Logging In

    No account? Signup here.

    All users should have received their login credentials in an email. If you have not, please email the Virtual Research Support Core at scinet_vrsc@USDA.GOV.

    If you have not recieved a LincPass or YubiKey, please see the Deprecated Login Procedures page for instructions to access the HPC.

  • Data Transfer

    Data Transfer best practices.

    Globus Online is the recommended method for transferring data to and from the HPC clusters.

  • Modules

    The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.

  • Quotas

    Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.

    At login, current usage and quotas are displayed for all groups that a user belongs to. The my_quotas command provides the same output:

    $ my_quotas
    
  • Running Application Jobs on Compute Nodes

    Users will run their applications on the cluster in either interactive mode or in batch mode. Interactive mode ( salloc or srun command) is familiar to anyone using the command line: the user specifies an application by name and various arguments, hits Enter, and the application runs. However, in interactive mode on a cluster the user is automatically switched from using a login node to using a compute node. This keeps all the intense computation off the login nodes, so that login nodes can have all the resources necessary for managing the cluster. You should always use interactive mode when you are running your application but not using batch mode. Please do not run your applications on the login nodes, use the interactive mode.

  • Compiling Software

    The login node provides access to a wide variety of scientific software tools that users can access and use via the module system. These software tools were compiled and optimized for use on SCINet by members of the Virtual Research Support Core (VRSC) team. Most users will find the software tools they need for their research among the provided packages and thus will not need to compile their own software packages.

    To learn more about graphical software such as Galaxy, CSC, Geneious, RStudio, and Juptyer, please select the Software Preinstalled on Ceres guide
  • Citation/Acknowledgment

    Add the following sentence as an acknowledgment for using CERES as a resource in your manuscripts meant for publication:

    “This research used resources provided by the SCINet project and/or the AI Center of Excellence of the USDA Agricultural Research Service, ARS project numbers 0201-88888-003-000D and 0201-88888-002-000D.”