Partition and QoS List
Partitions are separate queues for jobs to be allocated to different subsets of the supercomputer's hardware. This is both to accommodate separate ownership of Research Computing-owned nodes and privately-owned nodes, as well as different classes of nodes with unique hardware.
Quality-of-Service ("QoS") refers to the length of time jobs may run, their preemption status, and any limitations on resources that may exist (e.g., for class
accounts).
Reference this page to help select the best partition and QOS option for your job. As a general rule the public
partition will satisfy most users' needs, as it contains both CPU and GPU nodes and may run up to 7 day long jobs.
Partitions
public
The public-use partition comprises all Research Computing-owned nodes. This partition has a wall time limit of 7 days. CPU and GPU jobs can be allocated in this partition using the public
, long
, debug
, and class
QoS.
- sbatch
- interactive
#SBATCH -p public
salloc -p public
When not specified, jobs will adopt the default for public
QoS (different from partition).
general
The general-use partition comprises all privately-owned nodes. This partition has a wall time limit of 30 days for users within the owning group. Jobs using owner-privileged CPU-only and GPU-accelerated jobs will typically be submitted here.
Users who are not part of any privileged group can use this partition as well, but with the possibility of job-preemption--that is, being cancelled--if the specific node you are allocated to is requested for use by the specific owning-group.
Users with privileged access to these nodes will access it with a grp_
prefixed QoS.
For other users, submit using general
partition and the private
QoS.
- owner-sbatch
- owner-salloc
- preemptable-sbatch
- preemptable-salloc
#SBATCH -p general
#SBATCH -q grp_<mygrpname>
#SBATCH -t 30-0
salloc -p general -q grp_<mygrpname> -t 30-0
#SBATCH -p general
#SBATCH -q private
#SBATCH -t 7-0
salloc -p general -q private -t 7-0
htc
The High-Throughput Computing (htc
) partition is aimed at jobs that can complete within a four-hour walltime. Jobs that can fit within this window have a scheduling advantage, as this partition includes not only Research Computing-owned nodes, but also privately-owned nodes. As part of the arrangement with the private node owners, htc
jobs can run without risk of pre-emption when submitted using the public
QoS. This partition is suitable for CPU and GPU-accelerated jobs.
- sbatch
- interactive
#SBATCH -p htc -q public -t 0-4
salloc -p htc -q public -t 0-4
highmem
The highmem partition is aimed at jobs that can require an extra amount of memory that cannot be satisfied by the regular compute nodes which typically have 512 GB. The computing power is roughly identical to regular compute nodes, just with greater upper-end memory capacity (up to 2TB). The highmem partition is currently capped to 48 hours (2 days) walltime for any given job. If longer than two days is required, a lengthier QOS is available to extend that to 7 days. To use this QoS, we invite you to reach out to the Research Computing staff, where we can help evaluate your job and its fitness to be on these nodes.
- sbatch
- interactive
#SBATCH -p highmem -q public -t 0-48
salloc -p highmem -q public -t 0-48
lightwork
The lightwork
partition is ideal for jobs that require relatively less computing power than typical supercomputing jobs and/or may stay idle for larger amounts of time. Great examples of this would be creating mamba environments, compiling software, VSCode tunnels, or bulk file operations.
- sbatch
- interactive
#SBATCH -p lightwork -q public -t 0-24
salloc -p lightwork -q public -t 0-24
The maximum job time is 24 hours, and the maximum CPU cores per node are 8:
Jobs that utilize cores to their full potential are more appropriately used in htc
, public
or general
partitions, where cores are not shared/oversubscribed. Jobs that use full cores to > 99% for a continued duration of time, or jobs that request excessive resources prohibiting other users from using this partition, are subject to cancellation. Repeated misuse of this partition will result in ineligibility from using lightwork
going forward.
QOS
public
The public
QoS should be used for any jobs that use public Research Computing Resources. This includes any job submitted to the above-listed partitions (excepting general
). Under most circumstances, public
is the preferred QoS.
- public
- htc
- lightwork
#SBATCH -p public -q public -t 7-0
#SBATCH -p htc -q public -t 0-4
#SBATCH -p lightwork -q public -t 0-12
debug
debug
is a special QoS for testing your sbatch
jobs for syntax errors and pre-compute testing.
When setting up your workflows, rather than submit to the normal compute partitions, submitting to debug offers the following advantages:
- Provides a much quicker turnaround in troubleshooting your scripts.
Are there syntax errors as your script progresses?
Are all paths and modules properly set to ensure a successful run? - Has shorter expected times for jobs to start
- Using the
debug
partition with a limited (small) dataset means you can quickly confirm the validity of your pipelines: you can then switch to your full dataset with greater confidence it will complete with the desired output.
- public
- general
- htc
- lightwork
#SBATCH -p public -q debug -t 15
#SBATCH -p general -q debug -t 15
#SBATCH -p htc -q debug -t 15
#SBATCH -p lightwork -q debug -t 15
The debug
QOS works with the general
and htc
partitions for walltimes up to 15 minutes.
private
The private
QoS signifies a willingness to have your job flagged as pre-emptable (cancellable by node owners) as a trade-off for being able to use that privately-owned resource longer than the protected four-hour htc
partition offering. This can sometimes greatly reduce the time-to-start for jobs, but is heavily dependent on the aggregate supercomputer load. Pre-emptable jobs will be canceled if and only if the specific private owners submit a job to those resources your job is running on.
Examples of this might be to use privately owned GPUs, especially where checkpointing is used.
- sbatch
- salloc
#SBATCH -p general
#SBATCH -q private
salloc -p general -q private
To recap, job preemption will occur in the following circumstance:
- a member of the hardware-owning group schedules a job (
grp_labname
) that cannot be satisfied because of theprivate
job allocation is currently running on their lab's node.
grp_labname
Any resources purchased by labs will be given a grp_labname
QOS. For the users of the group, there is no fairshare impact for using that hardware.
- sbatch
- salloc
#SBATCH -p general
#SBATCH -q grp_<labname>
#SBATCH -t 30-0
salloc -p general -q grp_<labname> -t 30-0
grp_labname
jobs will not preempt public
jobs 4 hours or shorter in duration, but will preempt private
jobs.
long
The long
QoS permits runtimes on Research Computing-owned hardware to exceed the 7-day maximum and extends this to 14 days. It can not be used with interactive, only SBATCH.
- public
- highmem
#SBATCH -p public
#SBATCH -q long
#SBATCH -t 14-0
#SBATCH -p highmem
#SBATCH -q long
#SBATCH -t 14-0
This is a special-case QoS not available by default, it is granted to users on a case-by-case basis; if you are interested in using this QoS, please be ready to share a job ID demonstrating the need and effective use of existing core allocations. If you have any questions, feel free to ask staff and also explore the Slurm Efficiency reporter.
class
The class
QOS is a special QOS for users who have access to Sol as part of an academic course. The class QoS has additional resource limitations to ensure that jobs start sooner and resources are utilized effectively. These resource limits are:
- Job Resource Limits:
- Maximum of 32 CPU cores, 320 GB memory, and 4 GPUs per job
- Maximum wall time of 24 hours per job
- User-Level Limits:
- Maximum of 2 jobs running concurrently per user
- Maximum of 10 jobs in the queue per user
- Maximum of 960 GPU running minutes per user (equivalent to 1 GPU for 16 hours or 4 GPUs for 4 hours, shared across running jobs)
- sbatch
- salloc
#SBATCH -p public
#SBATCH -q class
#SBATCH -t 1-0
salloc -p public -q class -t 1-0
Users who have access to Sol as both an academic course and research account may need to specify which account to submit a job to. This can be done with the -A flag.
- class account
- research account
#SBATCH -p public
#SBATCH -q class
#SBATCH -t 1-0
#SBATCH -A class_asus101spring2025
#SBATCH -p public
#SBATCH -q public
#SBATCH -t 7-0
#SBATCH -A grp_<mygroup>
To see which accounts you have, run the command myfairshare
or myaccounts
:
$ myfairshare
Account User RawUsage_CHE RawFairShare TargetFairShare RealFairShare
class_asu101spring2025 jeburks2 0.0 1.000000 1.0000000 1.0000000
grp_mygroup jeburks2 61.1 0.043506 0.9957724 0.0435060
$ myaccounts
User Def Acct Account QOS
---------- ---------------- ---------------- --------------------------------
jeburks2 grp_rcadmins class_asu101s2025 class
jeburks2 grp_mylab grp_mylab public,private,debug