Skip to main content

Why Isn't My Code Running Faster?

Overview

The main benefit of a supercomputer is using many resources at one time for a calculation or running the same small calculation thousands of times. These computing modes are called HPC and HTC respectively. Scripts can only take advantage of the supercomputer if it's written into the code.

My Code Runs Slower On the Supercomputer Than on My Laptop

This can happen when you take a script that ran on your laptop and directly run it on the supercomputer. Many programs, including Python and R, aren't written to use multiple cores, so they will only use one core, no matter how many cores you request. When it comes to cores, it's quantity over quality in the supercomputer. Individual cores in the supercomputer are slower than the cores in your laptop, so it's expected that using one core on the supercomputer will be slower than one core on your laptop. Slower cores are cheaper, which allows more of them to be purchased.

tip

The speed of individual cores is called clock speed or CPU frequency. The clock speed on supercomputer cores are about 2 GHz, while most laptop CPUs have clock speeds of 3-4 GHz.

Serial vs. Parallel Computing

Serial computing means running commands one by one. This is what programs are doing when they only use a single core, they're running commands one by one. Parallel computing means spreading out those commands over multiple cores so commands can run at the same time. It's like checkout lanes at the grocery store: the more lanes that are open, the faster all the customers can checkout. Some programs do this automatically, such as most MATLAB functions. Some programs have options to do parallel computing, such as many SAS functions. Many programs need to be recoded by hand to run in parallel, such as Python and R scripts. Creating parallel scripts differs greatly between programing languages and will not be discussed here.

Not everything can be completely parallelized and sped up. If commands are dependent on previous commands, this is inherently serial. This can come up a lot with programs with time steps, such as agent based methods and genome evolution models. This means that there's a point where adding more cores does not speed up the calculation much.

tip

The process of measuring code speed up vs number of cores is called profiling. To learn more about profiling and parallel computing, we recommend the book Parallel and High Performance Computing, available in eBook format from the ASU library.

Use the seff Command to See if Your Code is Using Resources Efficiently

seff is short for “slurm efficiency” and will display the percentage of CPU and Memory used by a job relative to how long the job ran. The goal is high efficiency so that jobs are not allocating resources they are not using.

Example of seff for an inefficient job:

[jeburks2@sol-login02:~]$ seff 11273084
Job ID: 11273084
Cluster: sol
User/Group: jeburks2/grp_rcadmins
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:59
CPU Efficiency: 12.55% of 00:07:50 core-walltime
Job Wall-clock time: 00:07:50
Memory Utilized: 337.73 MB
Memory Efficiency: 16.85% of 2.00 GB
[jeburks2@sol-login02:~]$

This shows the job had a CPU for 7 minutes, but only used the CPU for 59 seconds, resulting in a 12% efficiency, but did use the memory.

Example of seff for a CPU efficient job:

[jeburks2@sol-login02:~]$ seff 11273083
Job ID: 11273083
Cluster: sol
User/Group: jeburks2/grp_rcadmins
State: TIMEOUT (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:59:39
CPU Efficiency: 98.98% of 01:00:16 core—wall time
Job Wall—clock time: 00:15:04
Memory Utilized: 337.73 MB
Memory Efficiency: 4.12% of 8.00 GB

In this example, the job used all four cores it was allocated for 98% of the time the job ran. The core-wall time is calculated by the number of CPU cores * the length of the job. This 15-minute job with 4 CPUs had a core-wall time of 1:00:00. However, the memory efficiency is rather low. This lets us know that if we run this job in the future, we can allocate less memory. This will reduce the impact to our fair share and use the system more efficiently.

warning

Note: Seff does not display statists for GPUs, so a GPU-heavy job will likely have inaccurate seff results

Computational Research Accelerator

The Computational Research Accelerator is a team within Research Computing that can help with speeding up, or accelerating, your code. This can include optimization and parallelization of the code as well as experimenting with novel hardware. They offer short consultation sessions or long term embedded consultations in a project. To request their services, create a support ticket by reviewing our RTO Request Help page.

Additional Help

If you require further assistance, contact the Research Computing Team:

We also offer Educational Opportunities and Workshops.