By: Harlan Svoboda, Herbarium Curator | 07/31/2024 Floral and Nursery Plants Research Unit, U.S. National Arboretum, Washington, D.C.
Across the USDA’s Agricultural Research Service (ARS) there are nealy 100 biological collections containing millions of preserved and viable specimens including animal tissues, seeds, fungal cultures, plant accessions, pinned insects, and viral isolates. These specimens and the data about them document and support ARS research efforts and are an integral part of delivering on the Agency’s mission.
One of these collections is the U.S. National Arboretum Herbarium, an assemblage of nearly 700,000 preserved plant specimens housed in Washington, D.C. The Herbarium dates back to the earliest days of the USDA and is still actively growing, with expeditions, research projects, and collaborations all adding substantial new material annually. The collections not only provide a unique snapshot of what plants are growing in a particular place at a given time, but also contribute valuable data and biological material for a range of research that includes describing new species, genetic analyses, and tracking invasive taxa.
Recently, the National Arboretum committed to making its priceless preserved specimens, and their data, more widely accessible to researchers and the American public through digitization. In 2021, the Herbarium’s entire collection was fully digitized using a high-throughput conveyor system to image all specimens. In total, this venture took only 15 months to complete, thanks to the high-throughput imaging system—a remarkable achievement considering that manually digitizing the collection would have taken approximately 419 years.
The question became, though: what to do with the nearly 300 TB worth of files created during the project? Specifically, the original, unprocessed image files needed special consideration to ensure their permanent safeguarding. This project highlights a sometimes-overlooked function of SCINet: long-term, permanent storage of critical USDA digital assets. Much like the physical/analog specimens themselves, which must be preserved and stored in perpetuity, so too must the digital files that represent them.
SCINet’s “Juno” storage system, located in Beltsville, Maryland, is designed for long-term storage of research data and is an ideal resource for this use case. Juno is a multi-petabyte storage device that is regularly backed up to tape drives, located off-site at Mississippi State University, and can be securely accessed via command line or the Globus application. Like SCINet’s supercomputers (Atlas and Ceres), Juno is operated and maintained by the SCINet Virtual Research Support Core (VRSC). Knowing that the complete digital complement of the Herbarium is safely stored on Juno means the National Arboretum can focus on better serving its stakeholders.
Led by the National Arboretum, ARS will be launching the ARS Biocollections Portal in the coming months, an exciting online endeavor which will allow users from across the globe to access and use USDA collections like never before. This Agency-wide initiative already contains four large collections with additional members slated for inclusion in the years to come. None of this could be possible, though, without first ensuring the original files are protected long-term.
This work showcases how SCINet can be incorporated into various types of projects throughout USDA, even ones that do not fall into the “typical” supercomputing or big data categories. SCINet as a partner in collections-based programs can support (and catalyze) how ARS’s treasured specimens are safeguarded, managed, and utilized.
Figure 1. Preserved plant specimens in the U.S. National Arboretum Herbarium (left) were digitized in 2021, producing almost 300 TB of data and image files (center). Those digital assets will soon be accessible through the ARS Biocollections Portal, with the permanent versions being stored on Juno for safekeeping (right).