September 2024 Phoenix Maintenance Changelog
This document outlines the latest updates and improvements deployed during the April 2025 maintenance. These enhancements are designed to improve system performance, security, and usability across the Phoenix cluster.
Notable Changes
Phoenix System and Security Updates
- Applied latest Rocky Linux OS security updates to ensure continued protection against known vulnerabilities.
- The Slurm workload manager was upgraded from
24.05.1
to24.05.3
, addressing critical security patches and improving scheduler stability. - Per-node energy tracking has been enabled, allowing for greater insight into system power usage.
Phoenix Resource and Portal Enhancements
- Added 16 additional GPU MIG instances, increasing availability for GPU shard-based workloads.
- The web portal was upgraded from version
3.0.3
to3.1.7
.- New Globus integration allows seamless access to the Globus file manager.
Jupyter and Environment Manager Updates
- Jupyter Lab updated to the latest stable version, offering improved performance and UI features.
- Mamba environment manager upgraded to
1.5.10
for better compatibility with modern Python packages.
Improved Job Submission Experience
The job_submit
plugin was modernized to improve feedback for interactive job submissions. Jobs submitted with missing arguments will now output helpful default value messages. For example:
$ salloc -t 240
salloc: QOS not specified; assigning "public" qos
salloc: cpus-per-task not specified; assigning 1 core
salloc: time_limit <= 240 and Partition not specified; assigning "htc" partition
salloc: Pending job allocation 19824107
salloc: job 19824107 queued and waiting for resources
$ salloc -p general -t 240 -q public -c 1
salloc: Pending job allocation 19824117
salloc: job 19824117 queued and waiting for resources
Technical Updates
Infrastructure and Firmware
- Warewulf updated to
4.5.8-1
to enhance node provisioning and cluster management. - Grace Hopper firmware upgraded to version
3.17.0
. - Dell PowerStore firmware updated from
2.1.1.1
to3.6.1.3
. - Firewall firmware received critical updates to ensure network security.
- An arbiter process was added to the
soldtn
node for improved coordination and fault tolerance.
Jupyter and Python Tooling
- The Jupyter Notebook environment has been updated to the most recent stable version.
- Mamba now at version
1.5.10
, improving environment creation speed and dependency resolution.
MPI Performance Metric
We utilized the OSU Micro-Benchmarks (OMB) v7.4 from Ohio State University to validate the health of the Phoenix system before and after maintenance. These tests assess bandwidth and latency across randomly selected node pairs using all eight MPI modules on Sol.
This ensures:
- Proper node performance
- MPI module functionality
- Mamba module integrity
- Slurm scheduler behavior
A large number of test jobs were submitted to verify the overall system health.
To view performance comparisons from before and after this maintenance, visit: OMB tests - Google Drive
Additional Help
If you need assistance or notice any issues following these changes, please contact the Research Computing Team:
- Submit a ticket via the RTO Request Help page
- Join the
#rc-support
Slack channel for quick questions - Attend office hours for real-time support
For more information on our Educational Opportunities and Workshops, please visit our events page.