Onboarding Videos
Users who are new to the HPC environment may benefit from the following Ceres onboarding video which covers much of the material contained in this guide plus some Unix basics.
Ceres Onboarding (Intro to SCINet Ceres HPC) (length 42:13)
Note: /KEEP storage discussed in the video at 16:20 is no longer available. Instead data that cannot be easily reproduced should be manually backed up to Juno. The instructional video at https://www.youtube.com/watch?v=I3lnsCAfx3Q demonstrates how to transfer files between local computer, Ceres, Atlas and Juno using Globus.
The video includes:
- logging on to Ceres
- changing your password
- home and project directories
- data transfer to/from SCINet clusters
- basic SLURM job scheduler commands
- computing in interactive mode with salloc
- accessing Ceres software modules
- computing in batch mode with a batch script
Technical Overview
Ceres is the dedicated high performance computing (HPC) infrastructure for ARS researchers on ARS SCINet. Ceres is designed to enable large-scale computing and large-scale storage. Currently, the following compute nodes are available on the Ceres cluster.
100 regular compute nodes, each having:
- 72 logical cores on 2 x 18 core Intel Xeon Processors (6240 2.60GHz 25MB Cache) with hyper-threading turned ON
- 384GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
76 regular compute nodes, each having:
- 96 logical cores on 2 x 24 core Intel Xeon Processors (6240R 2.40GHz 36MB Cache) with hyper-threading turned ON
- 384GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
2 large memory nodes, each having:
- 80 logical cores on 2 x 20 core Intel Xeon Processors (6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
- 768GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
6 large memory nodes, each having:
- 80 logical cores on 2 x 20 core Intel Xeon Processors (6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
- 1,536GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
11 large memory nodes, each having:
- 96 logical cores on 2 x 24 core Intel Xeon Processors (6248R 3GHz 27.5MB Cache or 6248 2.50GHz 27.5MB Cache) with hyper-threading turned ON
- 1,536GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
1 GPU node that has:
- 72 logical cores on 2 x 18 core Intel Xeon Processors (6140 2.30GHz 25MB Cache) with hyper-threading turned ON
- 2 Tesla V100
- 384GB DDR3 ECC Memory
- 250GB Intel DC S3500 Series 2.5” SATA 6.0Gb/s SSDs (used to host the OS and provide small local scratch storage)
- 1.5TB SSD used for temporary local storage
- Mellanox ConnectX®3 VPI FDR InfiniBand
In addition there are a specialized data transfer node and several service nodes.
In aggregate, there are more than 9000 compute cores (18000 logical cores) with 110 terabytes (TB) of total RAM, 500TB of total local storage, and 3.7 petabyte (PB) of shared storage.
Shared storage consists of 2.3PB high-performance Lustre space, 1.4PB high-performance BeeGFS space and 300TB of backed-up ZFS space.
System Configuration
Since most HPC compute nodes are dedicated to running HPC cluster jobs, direct access to the nodes is discouraged. The established HPC best practice is to provide login nodes. Users access a login node to submit jobs to the cluster’s resource manager (SLURM), and access other cluster console functions. All nodes run on Linux CentOS 7.8.
Software Environment
Domain | Software |
---|---|
Operating System | CentOS |
Scheduler | SLURM |
Software | For the full list of installed scientific software refer to the Preinstalled Software List page or issue the module spider command on the Ceres login node. |
Modeling | BeoPEST, EPIC, KINEROS2, MED-FOES, SWAT, h2o |
Compilers | GNU (C, C++, Fortran), clang, llvm, Intel Parallel Studio |
Languages | Java 6, Java 7, Java 8, Python, Python 3, R, Perl 5, Julia, Node |
Tools and Libraries | tmux, Eigen, Boost, GDAL, HDF5, NetCDF, TBB, Metis, PROJ4, OpenBLAS, jemalloc |
MPI libraries | MPICH, OpenMPI |
Profiling and debugging | PAPI |
For more information on available software and software installs refer to our guides on Modules, Singularity Containers and Installing R, Python, and Perl Packages.
Additional Guides for Ceres:
-
Data Transfer
Data Transfer best practices.
Globus Online is the recommended method for transferring data to and from the HPC clusters.
-
Modules
The Environment Modules package provides dynamic modification of your shell environment. This also allows a single system to accommodate multiple versions of the same software application and for the user to select the version they want to use. Module commands set, change, or delete environment variables, typically in support of a particular application.
-
Quotas
Each file on a Linux system is associated with one user and one group. On Ceres, files in a user’s home directory by default are associated with the user’s primary group, which has the same name as user’s SCINet account. Files in the project directories by default are associated with the project groups. Group quotas that control the amount of data stored are enabled on both home and project directories.
At login, current usage and quotas are displayed for all groups that a user belongs to. The
my_quotas
command provides the same output:$ my_quotas
-
SLURM Resource Manager
Ceres uses Simple Linux Utility for Resource Management (SLURM) to submit interactive and batch jobs to the compute nodes. Requested resources can be specified either within the job script or using options with the
salloc
,srun
, orsbatch
commands. -
Compiling Software
The login node provides access to a wide variety of scientific software tools that users can access and use via the module system. These software tools were compiled and optimized for use on SCINet by members of the Virtual Research Support Core (VRSC) team. Most users will find the software tools they need for their research among the provided packages and thus will not need to compile their own software packages.
To learn more about graphical software such as Galaxy, CSC, Geneious, RStudio, and Juptyer, please select the Software Preinstalled on Ceres guide -
Citation/Acknowledgment
Add the following sentence as an acknowledgment for using CERES as a resource in your manuscripts meant for publication:
“This research used resources provided by the SCINet project and/or the AI Center of Excellence of the USDA Agricultural Research Service, ARS project numbers 0201-88888-003-000D and 0201-88888-002-000D.”