Skip to main content

A Brief Example

Example packages used for this tutorial are: multiqc and tensorflow. Step 1 is about finding multiqc. Step 4 will cover tensorflow.

Go to anaconda.org and search for multiqc in the search bar:

Search Packages

Exam the search result, click on the card with the bioconda tag. This tag is a "channel" name, representing a online folder location where mamba can find and download the correct packages. The most popular channels are conda-forge and bioconda, and remember to avoid using the main and the defaults channel.

Search Results

Exam the details on the following page, check the version and the home website information.

Package Details

On the above screenshot, the installation command needs a modification. The conda part in the red circle needs to be changed to mamba. Do not run any conda command on the supercomputers. The syntax for indicating a channel name can either be bioconda::multiqc or -c bioconda multiqc.

Step 2 - Install

Connect to the CiscoVPN. Open a command line interface on Sol by navigating to sol.asu.edu on your browser and selecting the "Sol Shell Access" option from the "System" menu option. Or by SSHing into Sol using the command ssh <asurite>@sol.asu.edu

[rcsparky@login01:~]$ interactive -p htc -c 4 -t 30 -p lightwork
[rcsparky@sc001:~]$ module load mamba/latest
[rcsparky@sc001:~]$ mamba create -n myENV -c conda-forge -c bioconda python=3 multiqc
[rcsparky@sc001:~]$ source activate myENV
(myENV) [rcsparky@sc001:~]$ pip install tensorflow

More information can be found here: Managing Python Modules Through the Mamba Environment Manager

Step 3 - Use / Test

Once the myENV environment is ready, multiqc and tensorflow can be used directly or within a python session/script. Below is for using it in the shell, which is also a good testing method for newly built env.

  interactive -p htc -c 4 -t 30
module load mamba/latest
source activate myENV
python
>>> import multiqc

Step 4 - What about pip?

As explained in Python Package Installation Method Comparison, the only correct way to use pip on the ASU supercomputers at the moment, is to use it inside an activated mamba env.

There are some packages can only be found on pypi.org but not anaconda.org, then the only option to install them is via pip, inside an activated mamba env. Notably that the current official installation guide for both tensorflow and others prefers pip.

Mamba Module Diagram

Step 5 - Jupyter Notebook

After multiqc and tensorflow have been installed to myENV, we want to use this mamba env in the Jupyter Notebook session on the Sol web portal. So we need to make a Jupyter kernel from this mamba env. More details are covered in Preparing Python Environments for Jupyter and here are the example steps:

   interactive -p htc -c 4 -t 30
module load mamba/latest
mkjupy myENV "myENV_kernel"

Note that you don't need to activate any environment.

To find and use myENV_kernel

  1. Log in to the Sol web portal
  2. On the top bar: Interactive Apps > Jupyter > Fill out request form > Connect to Jupyter
  3. Inside the Jupyter Notebook: Open a Launcher page > Click on myENV_kernel icon. It usually shows up as the first cube, in front of the public kernels:
Jupyter Launcher Page

Once a jupyter kernel is made, it cannot be modified. So if you need to add more packages later, the correct steps are:

  1. Open a shell/terminal to access Sol or Phx
  2. Add the packages to the existing mamba env
  3. Recreate the jupyter kernel using the mkjupy command. You can use a new name if you want to keep the old kernel.
  4. Launch a new jupyter session on the Web Portal, and look for this new kernel.