Skip to main content

Downtime Archive

SCINet Past Scheduled Outages

The table below lists information about past SCINet outages. See SCINet Forum Announcements page (must have a SCINet account to access) for communications about emergency outages.

  • Ceres Storage Updates · Ceres - All · 2024

    Ceres will be unavailable for maintenance starting at 4pm CDT on Friday October 11th. The final sync for the cutover to the new all-flash Vast storage appliance will start then. Below you will find information on the new storage implementation, information on the transition so far, and actions to take if you are a user who has queued jobs or would like to run jobs with the new storage.

    Highlights:

    Maintenance to cutover to the new storage starts at 4pm CDT 10/11/24 and is planned to run through 10/15/24.

    • Users with jobs that will be held over maintenance will be required to issue scontrol release <JobID> commands for them to start.
    • Retired storage will be available in a read-only state for a limited time.
      • /90daydata-old will be available, read-only, for 90 days while the data ages out.
      • /project-old will be available, read-only, until the final sync to the new /project is done.
    • New /project will be available when the final sync finishes if the sync takes longer than the maintenance window.
    • New /90daydata will be immediately available (and empty).

    What is happening:

    Ceres is transitioning to a new storage appliance. After the maintenance, /project, /90daydata, and /home directories will all be served from the new Vast appliance. It has several performance and resilience advantages over the retiring storage. Most notable for users is the transition to all flash storage instead of the traditional spinning disks used by the retiring storage.

    The reason a maintenance has been dedicated to this cutover is to ensure the smooth and complete transfer of data from the old /project to the new one. Since /project is over 2PB of data spread across more than 1 billion individual files, the transfer takes a considerable amount of time. VRSC has been copying data from /project to the new storage for the last few weeks in preparation for this. We’ve been limiting the transfer to about 100TB/day in order to not impact running jobs. Since there have been files added, removed, and overwritten in the normal day-to-day operations of the cluster while this initial sync has been taking place, a final, complete sync must be done to capture the complete state of the retiring filesystem.

    What you may have to do:

    1. After the maintenance check if you have jobs waiting in the queue with squeue --me command.
    2. Review your jobs waiting in the queue to identify what storage they need. Run the scontrol show job <JobID> command to view information on a submitted job.
    3. Release your jobs AFTER reading the following

    Jobs will be placed into a held state over the maintenance. This is being done to prevent them from running automatically, since the storage may not be in the state the jobs are relying on to run. If the final sync for /project takes longer than the maintenance window, we’re going to make the cluster available without it so jobs can be run in /90daydata. The retiring filesystems will still be available at /90daydata-old (for 90 days while it phases itself out) and /project-old (until the sync to the new /project completes). If you have jobs that are in a held state and you have confirmed that they will be able to access the directories and data they need, you can put them back into the queue with a scontrol resume <JobID> command.

    If you have jobs that will require data in its original location in /project and the sync hasn’t finished yet:
    You can either wait for the sync to finish before running scontrol resume <JobID>, or you can cancel the jobs with scancel <JobID>, copy the data to /90daydata, and start new jobs working out of /90daydata. This is the most likely scenario for most jobs.

    If you have data in /90daydata-old that you would like to use:
    Transfer data from /90daydata-old to the new /90daydata to keep it for another 90 days.

    If the new /project isn’t available yet and you would like to run jobs with data from /project-old:
    Transfer data from /project-old to /90daydata and run your jobs from /90daydata. Directly referencing /project-old in jobs is NOT recommended as that storage mount will be removed when the sync is finished.

    If you have any questions or concerns, or if you need help after the maintenance, please feel free to contact us at scinet_vrsc@usda.gov.

    • Friday, October 11 - Tuesday, October 15 · 2024
    • Outage time: 4 pm CDT
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: Ceres Storage Updates
  • Maintenance · Ceres - All · 2024

    Ceres cluster maintenance is scheduled for June 17-21, 2024 (the week of Juneteenth).

    During the maintenance, the following major modifications to Ceres will take place in addition to the usual maintenance items:

    • System software updates:
      • Ceres will be transitioned from running Alma Linux to Red Hat Enterprise Linux.
      • Infiniband switches will be updated.
    • Storage:
      • A new Vast storage will be added to the cluster
      • The new storage will eventually replace existing storage hardware.
      • Data will not be moved to the new storage during the maintenance.
    • Hardware management:
      • Old ethernet switches will be removed.
    • IPA migration:
      • The identity management system will be migrated to a new domain.
      • Some users will need to perform a one-time account migration action after the maintenance.

    Queued jobs will not start if they cannot complete by 6AM June 17. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance). The jobs will start after the scheduled outage completes.

    The Atlas cluster will be available during the Ceres maintenance. Make sure to copy data from Ceres to Atlas prior to the maintenance, if needed.

    Please submit any questions you may have via email to scinet_vrsc@usda.gov.

    • Monday, June 17 - Friday, June 21 · 2024
    • Outage time: 6 am CST
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: Maintenance
  • Maintenance · Atlas - All · 2024

    The Atlas compute cluster is scheduled for downtime/upgrade beginning April 30 at 6am Central and lasting through May 1. A downtime is required to repair a chilled water line for the cooling system. Lack of cooling during this repair will necessitate the shutdown of Atlas.

    Taking advantage of this shutdown, the operating system on Atlas will be upgraded from CentOS 7.8 to the Rocky 9.x distribution of Linux. This upgrade will also present a newer software stack. Users may need to recompile their software.

    The Ceres system will not be affected by this maintenance.

    An announcement will be made once the system is returned to operational status.

    Any issues/problems should be addressed to the help desk.
    scinet_vrsc@usda.gov
    help-usda@hpc.msstate.edu

    • Tuesday, April 30 - Wednesday, May 1 · 2024
    • Outage time: 6 am CST
    • Affected Systems: Atlas
    • Affected Locations: All
    • Reason: Maintenance
  • OS and Network Update · Ceres - All · 2024

    Ceres cluster maintenance is scheduled for February 19-21, 2024 (Presidents’ Day, and the following two days).

    During the maintenance, the following major modifications to Ceres will take place in addition to the usual maintenance items:

    • Operating System:
      • Ceres will be transition from running AlmaLinux to Red Hat Enterprise Linux
    • Network:
      • Updates on existing switches
      • Installation of new switches
      • Recabling of Ceres to accommodate the new switches

    Queued jobs will not start if they cannot complete by 6AM February 19. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance) . The jobs will start after the scheduled outage completes.

    Atlas cluster will be available during the Ceres maintenance. Make sure to copy data from Ceres to Atlas prior to the maintenance if needed.

    Please submit any questions you may have via email to scinet_vrsc@usda.gov.

    • Monday, February 19 - Wednesday, February 21 · 2024
    • Outage time: 6 am
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: OS and Network Update
  • Holiday · VRSC Support - All · 2023

    Due to the upcoming holiday, there will not be any VRSC support available from December 25-27.

    Please direct all questions to scinet_vrsc@usda.gov.

    • Monday, December 25 - Wednesday, December 27 · 2023
    • Outage time: Daily
    • Affected Systems: VRSC Support
    • Affected Locations: All
    • Reason: Holiday
  • Holiday · VRSC Support - All · 2023

    Due to the upcoming holiday, there will not be any VRSC support available from November 22-24.

    Please direct all questions to scinet_vrsc@usda.gov.

    • Wednesday, November 22 - Friday, November 24 · 2023
    • Outage time: Daily
    • Affected Systems: VRSC Support
    • Affected Locations: All
    • Reason: Holiday
  • Maintenance · Galaxy - Ceres - Tuesday, November 14 · 2023

    Galaxy will be unavailable between 9AM - 5PM on 11/14/2023
    Downtime is required to change the location of galaxy related paths from /90daydata to /project

    Background - Galaxy saves upload, output and intermediate files in /90daydata on Ceres. The 90daydata file system is experiencing frequent performance issues that is causing job timeouts and, in some extreme cases, job failures

    Changes - During maintenance, the paths to upload, output and intermediate files will be set to /project as this is performant and is still under warranty. This is our current best option for Galaxy.

    Notes - Only new files created after the maintenance will be saved in /project, existing files will still remain on /90daydata(reminder that these files will be purged by the filesystem after 90days so please save them elsewhere)

    • Tuesday, November 14 · 2023
    • Outage time: 9am-5pm
    • Affected Systems: Galaxy
    • Affected Locations: Ceres
    • Reason: Maintenance
  • Maintenance · Site Service - Ames - Monday, October 23 · 2023

    Site Service at Ames will be impacted for essential maintenance at Ames. Outages are expected and the entire window is reserved.

    • Monday, October 23 · 2023
    • Outage time: 6-11am UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Maintenance
  • Software Update · Ceres - All - Monday, October 9 · 2023

    Ceres cluster maintenance is scheduled for October 9-10, 2023 (Indigenous Peoples Day, and the following day), to update system software.

    During the maintenance we will also upgrade Open OnDemand to version 3 and BeeGFS file system to version 7.4.

    Queued jobs will not start if they cannot complete by 6AM October 9. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance) . The jobs will start after the scheduled outage completes.

    Atlas cluster will be available during the Ceres maintenance. Make sure to copy data from Ceres to Atlas prior to the maintenance if needed.

    Please submit any questions you may have via email to scinet_vrsc@usda.gov.

    • Monday, October 9 · 2023
    • Outage time: 6 am
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: Software Update
  • Maintenance · Site Service - Ames - Friday, September 29 · 2023

    ARS SCINet Site Service Ames will be unavailable while Internet2 circuit vendor Lumen performs circuit maintenance. The entire window is reserved.

    • Friday, September 29 · 2023
    • Outage time: 5-11 UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Maintenance
  • Maintenance · Site Service NAL - Beltsville · 2023

    Site Service NAL (Beltsville) will be unavailable while Fiberlight engineers perform maintenance. Outages are expected. The entire maintenance window is reserved.

    • Monday, September 11 - Friday, September 15 · 2023
    • Outage time: 3:30-10 UTC
    • Affected Systems: Site Service NAL
    • Affected Locations: Beltsville
    • Reason: Maintenance
  • Maintenance · Backbone - NAL · 2023

    Backbone NAL-NAL will be unavailable while Fiberlight engineers perform maintenance. Outages are expected. The entire maintenance window is reserved.

    • Monday, September 11 - Wednesday, September 13 · 2023
    • Outage time: 10-4 UTC
    • Affected Systems: Backbone
    • Affected Locations: NAL
    • Reason: Maintenance
  • Maintenance · Site Service - AMES, NAL - Tuesday, August 29 · 2023

    Site Service at AMES and NAL will be impacted while Internet2 performs maintenance to upgrade core nodes. Outages are expected and the entire window is reserved.

    • Tuesday, August 29 · 2023
    • Outage time: 4-6 UTC
    • Affected Systems: Site Service
    • Affected Locations: AMES, NAL
    • Reason: Maintenance
  • Emergency Maintenance · Site Service - Ames - Friday, July 7 · 2023

    The listed asset will be unavailable while vendor Internet2 performs a software maintenance and troubleshooting tasks on core1.eqch. Multiple 20 minute hard down events are expected. The entire window is reserved. </br></br> This will not affect the Ceres cluster and the jobs.

    • Friday, July 7 · 2023
    • Outage time: 5-10 UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Emergency Maintenance
  • Maintenance · Site Service - Beltsville - Friday, June 23 · 2023

    Site Service Beltsville will be unavailable while Fiberlight engineers performs maintenance. Outages are expected. The entire maintenance window is reserved.

    • Friday, June 23 · 2023
    • Outage time: 11:30 pm ET
    • Affected Systems: Site Service
    • Affected Locations: Beltsville
    • Reason: Maintenance
  • Maintenance · Site Service - Stoneville - Thursday, June 22 · 2023

    Site Service at Stoneville will be impacted while Internet2 performs maintenance to upgrade core nodes. Outages are expected and the entire window is reserved.

    • Thursday, June 22 · 2023
    • Outage time: 4-8 UTC
    • Affected Systems: Site Service
    • Affected Locations: Stoneville
    • Reason: Maintenance
  • Maintenance · Site Service - Multiple locations - Wednesday, June 21 · 2023

    Site Service at Fort Collins, Albany & Clay Center will be impacted while Internet2 performs maintenance to upgrade core nodes. Outages are expected and the entire window is reserved.

    • Wednesday, June 21 · 2023
    • Outage time: 4-8 UTC
    • Affected Systems: Site Service
    • Affected Locations: Multiple locations
    • Reason: Maintenance
    • Affected Assets:
      • Fort Collins
      • Albany
      • Clay Center
  • Maintenance · Site Service - Multiple - Tuesday, June 20 · 2023

    Site Service at Ames & Beltsville will be impacted while Internet2 performs maintenance to upgrade core nodes. Outages are expected and the entire window is reserved.

    • Tuesday, June 20 · 2023
    • Outage time: 4-8 UTC
    • Affected Systems: Site Service
    • Affected Locations: Multiple
    • Reason: Maintenance
    • Affected Assets:
      • Ames
      • Beltsville
  • System Update · Ceres - All · 2023

    Ceres cluster maintenance is scheduled for the week of June 19, to update system software. The cluster will be down for several days.

    • Monday, June 19 - Friday, June 23 · 2023
    • Outage time: Beginning 6 AM
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: System Update
  • Maintenance · Site Service - Beltsville - Sunday, June 18 · 2023

    Site Service Beltsville will be unavailable while Fiberlight engineers performs maintenance. Outages are expected. The entire maintenance window is reserved.

    • Sunday, June 18 · 2023
    • Outage time: 11:30 pm ET
    • Affected Systems: Site Service
    • Affected Locations: Beltsville
    • Reason: Maintenance
  • Maintenance · Juno - All - Tuesday, June 13 · 2023

    A planned maintenance evolution will occur on Tuesday, June 13th, 2023, between 6am and 5pm ET at the National Agricultural Library (NAL).

    This maintenance is necessary to transfer core network equipment at NAL onto newer and more reliable backup power which will promote future stability and reliability for services at this site.

    During this time, access to Juno storage will be disrupted. We apologize in advance for any inconvenience this may cause.

    We will be working closely with our partners to minimize the impact of this maintenance and hope to complete the work early. We will provide updates on the status of the maintenance (on the SCINet Forum)[https://forum.scinet.usda.gov/t/access-to-juno-storage-disrupted-on-june-13-2023].

    • Tuesday, June 13 · 2023
    • Outage time: 10am-9pm UTC
    • Affected Systems: Juno
    • Affected Locations: All
    • Reason: Maintenance
  • Maintenance · Site Service - Ames - Tuesday, May 23 · 2023

    The listed assets may become unavailable due to scheduled maintenance being preformed by Internet2 vendor Lumen. Outages are expected. The entire window is reserved.

    • Tuesday, May 23 · 2023
    • Outage time: 5-11 UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Maintenance
  • Maintenance · Site Service - Ames - Friday, May 19 · 2023

    Site Service at Ames will be impacted while Lumen performs maintenance.

    Outages are expected and the entire window is reserved.

    • Friday, May 19 · 2023
    • Outage time: 5-11 UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Maintenance
  • Maintenance · Site Service - Stoneville - Thursday, May 18 · 2023

    The listed assets will be unavailable while Internet2 engineers perform Core Node maintenance. Outage are expected. The entire window is reserved.

    • Thursday, May 18 · 2023
    • Outage time: 11pm-6am ET
    • Affected Systems: Site Service
    • Affected Locations: Stoneville
    • Reason: Maintenance
  • Maintenance · Juno - all - Wednesday, May 10 · 2023

    At 6:00 PM Eastern on May 10th, the Juno long term storage system at Beltsville will be unmounted from SCINet DTNs and become inaccessible.

    This is being done in preparation for network maintenance to be performed after hours.

    The storage will be remounted, and access restored, the following morning.

    • Wednesday, May 10 · 2023
    • Outage time: 10 PM UTC
    • Affected Systems: Juno
    • Affected Locations: all
    • Reason: Maintenance
  • Maintenance · Site Service - Fort Collins - Tuesday, May 2 · 2023

    Site service Fort Collins will be unavailable while BISON engineers performs maintenance. Outages are expected. The entire maintenance window is reserved.

    • Tuesday, May 2 · 2023
    • Outage time: 12-14 UTC
    • Affected Systems: Site Service
    • Affected Locations: Fort Collins
    • Reason: Maintenance
  • Maintenance · Atlas - all - Monday, May 1 · 2023

    In order to replace a valve in the cooling loop supply for the atlas cluster system, a reservation has been made for Monday, May 1 beginning at 3:00am CST.

    • No running jobs will be killed.
    • All jobs that can not complete before the maintenance start time will be held and started once the system has returned to operation.

    • Monday, May 1 · 2023
    • Outage time: Beginning 3:00 am CST
    • Affected Systems: Atlas
    • Affected Locations: all
    • Reason: Maintenance
  • Maintenance · Site Service - Ames - Thursday, April 27 · 2023

    Site Service at Ames will be impacted while Lumen performs maintenance. The entire window is reserved. Outages are expected and the entire window is reserved.

    • Thursday, April 27 · 2023
    • Outage time: 4-11 UTC
    • Affected Systems: Site Service
    • Affected Locations: Ames
    • Reason: Maintenance
  • Maintenance · Ceres - All · 2023

    The data center that hosts Ceres cluster will have reduced cooling capacity starting the morning of April 12 and lasting through the end of the week.

    To lessen heat production generated by Ceres compute nodes during this maintenance a reservation has been created. New jobs will not start if they cannot complete by 6:00AM on April 12, 2023.

    In the output of the squeue command, the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance) The jobs will start after the scheduled outage completes.

    Idle nodes will be turned off. Running jobs that had started prior to reservation will be allowed to continue running as long as the temperature in the data center does not exceed the set threshold.

    The login and DTN nodes, as well as storage are scheduled to stay up.

    More nodes may be turned back on and be available for jobs on Thursday and Friday.

    The Ceres cluster is expected to run at full capacity starting Monday, April 18.

    • Wednesday, April 12 - Tuesday, April 18 · 2023
    • Affected Systems: Ceres
    • Affected Locations: All
    • Reason: Maintenance
  • Maintenance · Atlas - All (Atlas offline) - Tuesday, April 4 · 2023

    The Mississippi State University High Performance Computing Collaboratory’s (MSU/HPC2) Computing Office has scheduled maintenance for the Atlas cluster.

    During this maintenance window, the compute nodes and all support nodes including the login, devel, dtn, ood, etc… and those services including cron, globus, login, will be shutdown and unavailable.

    Helpdesk tickets should be submitted for any associated problems.

    • Tuesday, April 4 · 2023
    • Outage time: 8am-5pm CST
    • Affected Systems: Atlas
    • Affected Locations: All (Atlas offline)
    • Reason: Maintenance
  • Maintenance · SCINet - Albany - Thursday, March 2 · 2023

    The Albany site location will experience loss of connectivity to SCINet intermittently during the hours of 4:00 pm to 6:00 pm EST on March 2, 2023.

    • Thursday, March 2 · 2023
    • Outage time: 9-11pm UTC
    • Affected Systems: SCINet
    • Affected Locations: Albany
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) · 2023

    • Monday, February 20 - Thursday, February 23 · 2023
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · Ceres - All (/project) - Thursday, October 27 · 2022

    Due to recent issues with Ceres’ /project storage hardware, it needs to be replaced. The replacement hardware is expected to be delivered by end of the day on 10/26/2022 and the works will probably be done on 10/27/2022.

    Before replacing the hardware, we will post on the SCINet Forum and update the message of the day displayed at login to Ceres.

    While replacing the hardware, Ceres’ /project will not be accessible. We plan to suspend all running jobs before unmounting /project and resume the jobs once the maintenance completes.

    While we expect this will not affect running jobs, we recommend submitting new jobs to run on /90daydata to minimize the risk of the job dying due to this maintenance.

    • Thursday, October 27 · 2022
    • Affected Systems: Ceres
    • Affected Locations: All (/project)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) · 2022

    • Monday, October 10 - Tuesday, October 11 · 2022
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) · 2022

    • Monday, June 20 - Tuesday, June 21 · 2022
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · Atlas - All (connections to Atlas) - Tuesday, May 17 · 2022

    • Tuesday, May 17 · 2022
    • Affected Systems: Atlas
    • Affected Locations: All (connections to Atlas)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) - Monday, February 21 · 2022

    • Monday, February 21 · 2022
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · SCINet - Stoneville - Thursday, January 20 · 2022

    The maintenance window is one (1) hour in duration. This will impact service to the Stoneville site only.

    • Thursday, January 20 · 2022
    • Outage time: 10:30-11:30 CST
    • Affected Systems: SCINet
    • Affected Locations: Stoneville
    • Reason: Maintenance
  • Full cluster Maintenance · Atlas - All (connections to Atlas) - Wednesday, December 8 · 2021

    Wednesday, December 8, beginning at 8am CST, the HPC2 Computing Office has scheduled maintenance for the atlas compute cluster. During this maintenance window, the login, devel, dtn, ood, and compute nodes for atlas will be unavailable and all associated cron jobs will be disabled.

    Downtime is expected to last most of the day. For any associated problems, submit a help desk ticket:

    • help-usda@hpc.msstate.edu - specific atlas issues
    • scinet_vrsc@usda.gov - general operational issues

    • Wednesday, December 8 · 2021
    • Outage time: Beginning 8:00am CST
    • Affected Systems: Atlas
    • Affected Locations: All (connections to Atlas)
    • Reason: Full cluster Maintenance
  • Network Maintenance in Ames · SCINet - All (connections to Ceres) - Thursday, November 18 · 2021

    SCINet network maintenance has been scheduled for Ames, IA. The maintenance window is from 8:30 to 10:30 Central Time (1430-1630 UTC) on 18 November 2021. Connectivity to SCINet will be sporadic during the maintenance window.

    • Thursday, November 18 · 2021
    • Outage time: 8:30 to 10:30 CST
    • Affected Systems: SCINet
    • Affected Locations: All (connections to Ceres)
    • Reason: Network Maintenance in Ames
  • Network Maintenance in Ames · SCINet - All (connections to Ceres) - Tuesday, November 16 · 2021

    Connectivity to SCINet will be sporadic during the maintenance window.

    • Tuesday, November 16 · 2021
    • Outage time: 8:30 to 10:30 CST
    • Affected Systems: SCINet
    • Affected Locations: All (connections to Ceres)
    • Reason: Network Maintenance in Ames
  • Network Maintenance in Albany · SCINet - Albany - Monday, November 15 · 2021

    Local connectivity to SCINet will be sporadic during the maintenance window.

    • Monday, November 15 · 2021
    • Outage time: 8:30 to 10:30 PST
    • Affected Systems: SCINet
    • Affected Locations: Albany
    • Reason: Network Maintenance in Albany
  • Maintenance · Ceres - All (Ceres offline) - Thursday, November 11 · 2021

    Ceres maintenance is scheduled for Thursday, November 11, 2021 to upgrade internal cluster network.

    Queued jobs will not start if they cannot complete by 6AM November 11. These include jobs submitted to the long partition with the default 3-weeks long time limit. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance). The jobs will start after the scheduled outage completes.

    The Atlas cluster will stay up and running during Ceres downtime. All Ceres users can run jobs on Atlas and use /90daydata that has no quotas.

    • Thursday, November 11 · 2021
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Fiber relocation · Ceres - All (connections to Ceres) · 2021

    The listed asset will be unavailable while Lumen engineers perform preventative fiber relocation work. Outage is expected to be two hours each day, but up to 5 hours is possible. The entire window is reserved.

    • Thursday, November 4 - Friday, November 5 · 2021
    • Outage time: Daily 05:00 to 11:00 UTC
    • Affected Systems: Ceres
    • Affected Locations: All (connections to Ceres)
    • Reason: Fiber relocation
  • Network update · Ceres, Juno - All (connections to Ceres, Juno) - Thursday, October 28 · 2021

    A maintenance window has been scheduled for 28 October 2021 from 1530 - 1730 UTC (10:30am to 12:30pm Central time) to stabilize router (Albany MX480 RE Downgrade).

    Periodic outages will be experienced as equipment is rebooted. Connectivity to Ceres and Juno cannot be guaranteed during the maintenance window.

    • Thursday, October 28 · 2021
    • Outage time: 1530 - 1730 UTC
    • Affected Systems: Ceres, Juno
    • Affected Locations: All (connections to Ceres, Juno)
    • Reason: Network update
  • Network update · Ceres, Juno - All (connections to Ceres, Juno) - Tuesday, October 26 · 2021

    A maintenance window has been scheduled for 26 October 2021 from 4:30pm to 8:30pm Central time to stabilize the SCINet Network. Periodic outages will be experienced as equipment is rebooted. Connectivity to Ceres and Juno cannot be guaranteed during the maintenance window.

    • Tuesday, October 26 · 2021
    • Outage time: 4:30pm to 8:30pm CST
    • Affected Systems: Ceres, Juno
    • Affected Locations: All (connections to Ceres, Juno)
    • Reason: Network update
  • Router update · Ceres - All (connections to Ceres) - Tuesday, October 19 · 2021

    The router at Ames will be rebooted on or about 4:30 CT. The reboot should be about 15 minutes. After that the router will be upgraded to the latest OS. Outages may occur during that process.

    • Tuesday, October 19 · 2021
    • Outage time: Beginning at 4:30 CST
    • Affected Systems: Ceres
    • Affected Locations: All (connections to Ceres)
    • Reason: Router update
  • Router update · SCINet - various · 2021

    More SCINet network hardware OS updates. Check the announcement page for more details

    • Wednesday, September 22 - Friday, September 24 · 2021
    • Outage time: various
    • Affected Systems: SCINet
    • Affected Locations: various
    • Reason: Router update
  • OS Upgrade · SCINet - various · 2021

    GNOC plans to upgrade the OS on the SCINet gear at the 6 locations. This will result in connectivity interruptions during the upgrade. The upgrade schedule is the following:

    • Albany - 9/16 8AM PST
    • Clay Center - 9/16 4PM CST
    • Ames - 9/17 8AM CST
    • Stoneville - 9/20 8AM CST
    • NAL - 9/20 3PM CST
    • CSU - 9/21 9AM CST

    • Thursday, September 16 - Tuesday, September 21 · 2021
    • Outage time: various
    • Affected Systems: SCINet
    • Affected Locations: various
    • Reason: OS Upgrade
  • Maintenance · Ceres - All (connections to Ceres) · 2021

    This maintenance window will be longer than normal as there are several important hardware upgrades occurring during this window to enhance the overall power and capacity of the CERES HPC cluster. These upgrades include the remaining new priority nodes, sixty eight additional compute nodes, two additional high memory compute nodes, six management nodes, and faster Infiniband switching technology used by the HPC nodes to access storage. VRSC will re-rack and re-wire the whole cluster to accommodate additional hardware while adhering to power and cooling limits.

    Queued jobs will not start if they cannot complete by 7AM August 23. These include jobs submitted to the long partition with the default 3-weeks long time limit. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance). The jobs will start after the scheduled outage completes.

    The Atlas cluster will stay up and running during Ceres downtime. All Ceres users can run jobs on Atlas. If you don’t have a large enough project quota on Atlas, remember that you can use /90daydata on Atlas that has no quotas

    • Monday, August 23 - Friday, September 3 · 2021
    • Outage time: 7am 8/23 - 5pm 9/3
    • Affected Systems: Ceres
    • Affected Locations: All (connections to Ceres)
    • Reason: Maintenance
  • Outage · Ceres · 2021

    Connection Restored on 07-21-2021

    • Wednesday, July 7 - Wednesday, July 21 · 2021
    • Affected Systems: Ceres
    • Reason: Outage
  • Maintenance · Ceres - All (connections to Ceres) · 2021

    The listed assets will be unavailable while contractors perform testing on the elecrtical service switchgear, generators, and turbine. Outages throughout the window are expected. The entire window is reserved.

    • Monday, May 24 - Thursday, May 27 · 2021
    • Affected Systems: Ceres
    • Affected Locations: All (connections to Ceres)
    • Reason: Maintenance
    • Affected Assets:
      • Site Service AMES
      • ARSS-AMES-AMES-10GE-01550
      • ARSS-AMES-AMES-10GE-01576
      • ARSS-AMES-AMES-10GE-01522
      • ARSS-AMES-NAL-VLAN-01508
      • ARSS-AMES-CSU-VLAN-01517
      • ARSS-AMES-AMES-VLAN-01525
      • ARSS-AMES-AMES-VLAN-01524
      • ARSS-AMES-AMES-10GE-01577
      • ARSS-AMES-AMES-40GE-01573
      • ARSS-AMES-AMES-10GE-01553
      • ARSS-AMES-STNVL-VLAN-01503
      • ARSS-AMES-AMES-10GE-1577
      • ARSS-AMES-AMES-10GE-01556
      • ARSS-AMES-AMES-10GE-ET-11
      • ARSS-AMES-AMES-10GE-01554
      • ARSS-AMES-AMES-10GE-01555
      • ARSS-ALB-AMES-VLAN-01520
      • ARSS-AMES-AMES-LAG-01526
      • ARSS-AMES-CLAY-VLAN-01507
      • ARSS-AMES-AMES-10GE-01557
      • ARSS-AMES-AMES-LAG-01552
      • ARSS-AMES-AMES-40GE-01575
      • sw.ames.scinet.science
      • rtrj.ames.scinet.science
      • fw.ames.scinet.science
  • Maintenance · Atlas - All (connections to Atlas) - Tuesday, February 23 · 2021

    The HPC2 Computing Office has scheduled a maintenance for its core networking services. During this time all network connectivity both inside and outside the HPC2 will be unavailable including access to the atlas cluster systems.

    • Tuesday, February 23 · 2021
    • Outage time: 6:00am - 8:00am
    • Affected Systems: Atlas
    • Affected Locations: All (connections to Atlas)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) - Tuesday, February 16 · 2021

    • Tuesday, February 16 · 2021
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) - Monday, February 15 · 2021

    • Monday, February 15 · 2021
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Maintenance · Ceres - All (Ceres offline) - Monday, October 12 · 2020

    • Monday, October 12 · 2020
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • UPS Maintenance · SCINet - Stoneville - Tuesday, August 25 · 2020

    SCINet equipment will be shutdown in order to perform Maintenance to the UPS. SCINet connectivity at the Stoneville location will be impacted. The Maintenance window is reserved from 0700 to 1600 Central Time.

    • Tuesday, August 25 · 2020
    • Outage time: 7:00 - 16:00 CST
    • Affected Systems: SCINet
    • Affected Locations: Stoneville
    • Reason: UPS Maintenance
  • Maintenance · Ceres - All (Ceres offline) · 2020

    • Tuesday, June 16 - Wednesday, June 17 · 2020
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Planned power outage · SCINet & AWS - Multiple locations · 2020

    SCINet equipment at the National Agricultural Library will be powered down in advance of a planned power outage to the NAL building. The outage is expected to last for 24 hrs or less. We expect that normal access to SCINet resources will be restored on or before Monday, April 20.

    Please check Basecamp during the outage period for updates.

    • Friday, April 17 - Sunday, April 19 · 2020
    • Outage time: Beginning 9:00pm EST
    • Affected Systems: SCINet & AWS
    • Affected Locations: Multiple locations
    • Reason: Planned power outage
    • Affected Assets:
      • Beltsville, MD (NAL, BARC East, BARC West) SCINet connectivity and local data transfer nodes/cafe machines sn-barc-east-dtn-0.scinet.ars.usda.gov, sn-barc-west-dtn-0.scinet.ars.usda.gov, sn-nal-dtn-0.scinet.ars.usda.gov
      • SDWAN connected equipment at Fargo and University Park
      • AWS Authentication for all SCINet AWS users
  • Router migration · SCINet - Ft Collins - Thursday, March 19 · 2020

    • Thursday, March 19 · 2020
    • Outage time: 9:30 - 11:30am MST
    • Affected Systems: SCINet
    • Affected Locations: Ft Collins
    • Reason: Router migration
  • Router replacement · SCINet - Ft Collins - Thursday, March 12 · 2020

    • Thursday, March 12 · 2020
    • Outage time: 10am - noon MST
    • Affected Systems: SCINet
    • Affected Locations: Ft Collins
    • Reason: Router replacement
  • Router replacement · SCINet - Clay Center - Monday, March 2 · 2020

    • Monday, March 2 · 2020
    • Outage time: 10:30am - 1:00pm EST
    • Affected Systems: SCINet
    • Affected Locations: Clay Center
    • Reason: Router replacement
  • Maintenance · Ceres - All (Ceres offline) - Monday, February 17 · 2020

    • Monday, February 17 · 2020
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Maintenance
  • Upgrades/expansion · Ceres - All (Ceres offline) · 2019

    Ceres downtime is scheduled for Monday, December 2 - Friday, December 6. This downtime is to rewire both power and networking on Ceres for the addition of additional compute nodes and to ready it for storage expansion.

    We do not anticipate any further extended downtimes for rewiring, as this should allow us to maximize the size of Ceres simply by adding additional compute nodes.

    Since this affects the Authentication for SCINet, this will also affect logins to Data Transfer nodes at Ames, StoneVille, Fort Collins, Clay Center, Albany CA, and Beltsville.

    GlobalNoc will also be upgrading software on the SCINet network infrastructure during this time.

    • Monday, December 2 - Friday, December 6 · 2019
    • Affected Systems: Ceres
    • Affected Locations: All (Ceres offline)
    • Reason: Upgrades/expansion