Skip to main content

March 2026 Maintenance - Phoenix Network and Storage Upgrades

· 4 min read
Research Computing Team
RC Docs Maintainers

This post summarizes the updates and improvements deployed during the March 2026 maintenance window. Phoenix received major enhancements to its Ethernet networking and bug fixes to the Scratch filesystem.

Network Upgrades

Phoenix's core Ethernet network has been upgraded from 10/25 Gb/s to 40/100 Gb/s, quadrupling the aggregate bandwidth between racks to up to 200 Gb/s. This provides significantly improved performance for data transfers and network communication.

Scratch Filesystem Improvements

Over the past few months, many users have noticed lag or delays while traversing scratch. This was caused by the client constantly trying to connect over the OmniPath network, even when nodes did not have OmniPath connectivity. This issue has been resolved by forcing the client to maintain connections over the existing network. Users should see improved performance when traversing and accessing scratch.

RDMA connections have also been enabled for scratch, which should further improve performance for users with RDMA-capable network interfaces. This was previously disabled due to a bug in the BeeGFS client disabling RDMA when IPv6 was disabled, which is the case on Phoenix.

We have also applied a custom patch to the BeeGFS client to properly update to relatime (relative access timestamps), which should resolve issues with files appearing to have incorrect timestamps when accessed from Phoenix. Relative access time will update a file's access time on any read or write operation if the previous access time is older than the modification time, or older than 24 hours. Testing has shown that this does not affect the performance of scratch and provides more accurate access times for files on scratch data, which is important for the accuracy of the scratch data retention policy. This patch will be applied to Sol at the next maintenance window.

Scratch Benchmarking

The IO500 benchmarks, a widely used benchmark for evaluating the performance of storage systems in high-performance computing environments, have been run on Phoenix's scratch filesystem to evaluate the performance improvements from the recent updates. The results show that Phoenix's scratch filesystem is capable of delivering high performance for a variety of workloads, including small file access, large file access, and metadata operations. The results even outperform Sol's scratch filesystem in some tests, which is impressive given that Sol's scratch is running on newer hardware.

Benchmark Results:

Test NameSol (March 2023)Phoenix (March 2026)
ior-easy-write12.49 GiB/s22.15 GiB/s
ior-hard-write1.03 GiB/s0.54 GiB/s
ior-easy-read17.15 GiB/s27.31 GiB/s
ior-hard-read1.69 GiB/s2.28 GiB/s
find1433.96 kIOPS1994.09 kIOPS
mdtest-easy-write86.99 kIOPS130.48 kIOPS
mdtest-hard-write8.63 kIOPS7.36 kIOPS
mdtest-easy-stat333.97 kIOPS961.19 kIOPS
mdtest-hard-stat86.93 kIOPS236.19 kIOPS
mdtest-easy-delete58.25 kIOPS129.70 kIOPS
mdtest-hard-delete10.21 kIOPS10.37 kIOPS
mdtest-hard-read11.38 kIOPS20.12 kIOPS
Overall Bandwidth4.40 GiB/s5.23 GiB/s
Overall IOPS61.76 kIOPS102.06 kIOPS
Overall Score16.4823.09

Nvidia GPU Updates

The NVIDIA GPU driver has been updated to 595.45.04 supporting CUDA 13.2. However, NVIDIA has removed support for the Tesla V100 and GTX 1080 Ti GPUs in their latest driver releases. As a result, the V100/GTX 1080 Ti GPUs on Phoenix are currently running on an older driver version (580.95.05) that supports CUDA 13.0.

Technical Updates

  • Slurm updated to 25.11.3
  • Web portals updated to 4.0.10 on Phoenix.
  • Mamba package manager updated to 2.5.0 on Phoenix.
  • Jupyter updated on Phoenix.
  • Obsolete VASP modules removed to clean up module list
  • General security updates applied to all systems
  • InfiniBand cards installed in pcg085-pcg088
  • Home directories are now statically mounted on Phoenix, improving reliability (/data directories are still dynamically mounted on demand)

January 2026 Maintenance - Introducing Phoenix Scratch

· 3 min read
Research Computing Team
RC Docs Maintainers

This post summarizes the updates and improvements deployed during the January 2026 maintenance window. Changes to Sol were minimal, while Phoenix received a major enhancement with the introduction of Phoenix Scratch, a new high performance scratch storage system. Together, these updates improve performance, security, and overall usability across the clusters.

Introducing Phoenix Scratch

Phoenix Scratch Photo

Two racks housing the storage and networking infrastructure for Phoenix Scratch in the Iron Mountain Data Center.

We are pleased to announce the availability of Phoenix Scratch, a new high performance parallel scratch filesystem designed to support data intensive workloads on Phoenix.

  • 3 PiB of shared storage space
  • Directly mountable on InfiniBand, Omni-Path, and Ethernet fabrics
  • High throughput and low latency for I/O intensive applications
  • Parallel file system support for seamless integration with existing workflows
  • NVMe backed metadata for improved responsiveness
  • Policy based quotas for flexible and fair storage management

As with Sol Scratch, Phoenix Scratch is intended for temporary storage only. It is well suited for active job data, checkpoints, and intermediate results, but it should not be used for long term data retention. The automatic 90 day data retention policy applies to both Phoenix and Sols. Users should ensure that important data is backed up to appropriate long term storage systems.

Behind the Build

Bringing Phoenix Scratch online required significant infrastructure work, including over 800 meters of cabling and the installation of 507 HDDs, 60 SSDs, and 12 NVMe drives across 19 servers. The system is composed of three primary components:

Metadata Servers (MDS): Six high performance servers dedicated to metadata operations, each equipped with NVMe drives. These systems were built using nodes donated by Cirrus Logic and customized to support NVMe storage with drives donated by Intel.

Object Storage Servers (OSS): Thirteen storage servers providing the bulk data capacity and throughput. These systems use a mix of HDDs and SSDs and repurpose hardware previously deployed as the Cholla Storage System.

Networking: Each storage node is connected via 100 Gb Omni-Path, 100 Gb InfiniBand, and dual 40 Gb Ethernet links, ensuring high bandwidth and low latency access regardless of interconnect.

We extend our sincere thanks to our partners at Intel and Cirrus Logic for their generous hardware contributions, which made Phoenix Scratch possible. We also thank our researchers for their patience while Phoenix continued to operate during the deployment and integration of this new filesystem.

Transferring Data to Phoenix Scratch

Globus is the recommended method for transferring data to and from Phoenix Scratch. We have created a Globus collection specifically for Phoenix Scratch. For more information, see our documentation on transferring data between supercomputers.

Technical Updates

Infrastructure and Firmware

  • Duo 2FA Enabled for password-based SSH. More information can be found in the Duo 2FA Documentation.
  • Slurm upgraded to 25.11.1 on Sol and Phoenix.
  • Web portals updated to 4.0.8 on Sol and Phoenix.
  • Swap enabled on Sol and Phx login nodes to improve stability during high memory usage.
  • InfiniBand fabrics separated for Sol and Phoenix to improve performance and stability.
  • OmniPath fabric managers updated to 12.0.1
  • Updated Horizon project storage with latest patch release
  • Upgraded Hypervisors to latest stable release.
  • Load-balancing infrastructure updated to improve both security and reliability..