Using Containers on the Grace Hopper GH100
Overview
The Grace Hopper node (sgh100
) on the Sol supercomputer is a unique piece of equipment featuring an ARM-based processor (rather than x86_64) and a GPU. This combination lends to new capabilities, such as leveraging the high memory from the node and the modern gh200
GPU.
For more information on what makes the Grace Hopper node unique, see NVIDIA's Whitepaper
ARM processors do not run x86_64 software, so a special set of software tools are made available to leverage this hardware. You will need to compile for aarch64
for software to properly run on this node.
Alternatives to compiling ARM-based software include using Apptainer Containers compiled with ARM. On Sol, we have the following containers known to be working with this node:
[software@sgh001:~]$ ls -1 /packages/aarch64/simg/
autodock_2020.06.sif*
chroma_2021.04.sif*
gromacs_2023.2.sif*
julia_v2.4.1.sif*
lammps_patch_15Jun2023.sif*
nvhpc_24.5-devel-cuda_multi-ubuntu22.04.sif*
pytorch_24.05-py3.sif*
quantum_espresso_qe-7.1.sif*
relion_3.1.3.sif*
tensorflow_24.05-tf2-py3-igpu.sif*
Requesting Grace Hopper from the Job Scheduler
Using the following commands, you can request an allocation on this node and run these containers:
$ salloc -p arm
OR, to also get the H100 GPU:
$ salloc -p arm -G 1
Running a container on the Grace Hopper
$ apptainer run pytorch_24.05-py3.sif # when CPU only, OR
$ apptainer run --nv pytorch_24.05-py3.sif # when GPU is requested
=============
== PyTorch ==
=============
NVIDIA Release 24.05 (build 91431256)
PyTorch Version 2.4.0a0+07cecf4
Apptainer> python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True