Using Containers on the Grace Hopper GH100

Overview

The Grace Hopper node (sgh100) on the Sol supercomputer is a unique piece of equipment featuring an ARM-based processor (rather than x86_64) and a GPU. This combination lends to new capabilities, such as leveraging the high memory from the node and the modern gh200 GPU.

For more information on what makes the Grace Hopper node unique, see NVIDIA's Whitepaper

info

ARM processors do not run x86_64 software, so a special set of software tools are made available to leverage this hardware. You will need to compile for aarch64 for software to properly run on this node.

Alternatives to compiling ARM-based software include using Apptainer Containers compiled with ARM. On Sol, we have the following containers known to be working with this node:

[software@sgh001:~]$ ls -1 /packages/aarch64/simg/
autodock_2020.06.sif*
chroma_2021.04.sif*
gromacs_2023.2.sif*
julia_v2.4.1.sif*
lammps_patch_15Jun2023.sif*
nvhpc_24.5-devel-cuda_multi-ubuntu22.04.sif*
pytorch_24.05-py3.sif*
quantum_espresso_qe-7.1.sif*
relion_3.1.3.sif*
tensorflow_24.05-tf2-py3-igpu.sif*

Requesting Grace Hopper from the Job Scheduler

Using the following commands, you can request an allocation on this node and run these containers:

$ salloc -p arm

OR, to also get the H100 GPU:

$ salloc -p arm -G 1

Running a container on the Grace Hopper

$ apptainer run pytorch_24.05-py3.sif         # when CPU only, OR

$ apptainer run --nv pytorch_24.05-py3.sif    # when GPU is requested


=============
== PyTorch ==
=============

NVIDIA Release 24.05 (build 91431256)
PyTorch Version 2.4.0a0+07cecf4
Apptainer> python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

Overview​

Requesting Grace Hopper from the Job Scheduler​

Running a container on the Grace Hopper​

Overview

Requesting Grace Hopper from the Job Scheduler

Running a container on the Grace Hopper