Skip to main content

MaSIF on the HPC Cluster

MaSIF (Molecular Surface Interaction Fingerprints) predicts protein interaction patterns using geometric deep learning on molecular surfaces.

ApplicationTaskOutput
MaSIF-sitePredict protein–protein interaction sitesPer-vertex interaction probability
MaSIF-ligandClassify ligand binding pocket typeLigand class (7 types)
MaSIF-searchScan for structural binding partnersRanked binding configurations

Reference: Gainza et al., Nature Methods 17, 184–192 (2020).


Getting Started

1. Load the module

module load masif/1.0

This sets MASIF_ROOT and MASIF_SIF, and provides the masif-exec, masif-shell, masif_data, and copy_data shell functions.

2. Copy the working files to scratch

The installed scripts and data under /packages/apps/masif/1.0/ are read-only. Use the copy_data alias to get a personal writable copy on scratch:

copy_data

This copies everything to /scratch/$USER/masif/cpu/ and changes into that directory. You only need to do this once. After that, run all jobs from your copy.

warning

The data directories contain only scripts, lists, and pre-trained model weights — not preprocessed surface data. Preprocessing writes to data_preparation/ inside each application directory. A full dataset (all proteins from the paper) requires ~400 GB. Plan accordingly.

3. Bind scratch into the container

Apptainer does not automatically bind /scratch. Set MASIF_BINDS before running any commands or submitting jobs — add this to your ~/.bashrc so it persists across sessions:

echo 'export MASIF_BINDS=/scratch/$USER/masif/cpu:/scratch/$USER/masif/cpu' >> ~/.bashrc
source ~/.bashrc

All Slurm scripts read MASIF_BINDS automatically — no changes needed in the scripts. If your data lives elsewhere, point MASIF_BINDS to that path instead.

4. (Optional) Browse the read-only install

masif_data          # cd to /packages/apps/masif/1.0/userdata

Interactive Use

Two shell functions are available for running commands inside the MaSIF container:

# Run a single command
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"

# Open an interactive shell
masif-shell

Both automatically bind MASIF_ROOT into the container. To bind additional paths (e.g. a project directory outside your scratch working directory):

export MASIF_BINDS=/scratch/myproject:/scratch/myproject
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"

MaSIF-site — PPI Site Prediction

Predicts which surface residues are likely to participate in protein–protein interactions.

Quick start — single protein

cd /scratch/$USER/masif/cpu/masif_site

# Preprocess: download PDB, compute surface mesh + electrostatics + patches (~1–2 min)
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"

# Predict interaction sites
masif-exec bash -c "cd $PWD && ./predict_site.sh 4ZQK_A"

# Colour surface by predicted score (writes a .ply file)
masif-exec bash -c "cd $PWD && ./color_site.sh 4ZQK_A"

Output:

  • output/all_feat_3l/pred_data/pred_4ZQK_A.npy — per-vertex scores
  • output/all_feat_3l/pred_surfaces/4ZQK_A.ply — coloured surface mesh (open in PyMOL)

Using your own PDB file

masif-exec bash -c "cd $PWD && ./data_prepare_one.sh --file /path/to/protein.pdb 4ZQK_A"

Multi-chain input

For a single chain: 4ZQK_A For a complex where chains A and B interact: 1AKJ_A_B

Slurm resource reference

MaSIF-site Slurm script resources
ScriptResourcesInputLogs
data_prepare.slurmCPU array, 2 cores, 16 GB, 3 h/tasklists/full_list.txtexelogs/data_prepare.<jobid>_<taskid>.{out,err}
masif_site_train.slurm1 GPU, 4 cores, 32 GB, 40 hpreprocessed patchesexelogs/masif_site_train.<jobid>.{out,err}
masif_site_eval.slurm1 GPU, 4 cores, 32 GB, 40 htrained modelexelogs/masif_site_eval.<jobid>.{out,err}
predict_site.slurmCPU array, 2 cores, 16 GB, 3 h/tasklists/full_list.txtexelogs/predict_site.<jobid>_<taskid>.{out,err}
tip

Adjust #SBATCH --partition= and #SBATCH --array= to match your protein list length and cluster partition names before submitting.

danger

The current container uses TensorFlow 1.12, which requires CUDA ≤10. A100/Ampere GPUs (CUDA 11+) are not compatible. A GPU-enabled container will be provided separately. Training and evaluation will fall back to CPU in the meantime, but will be significantly slower.


MaSIF-ligand — Ligand Pocket Classification

Classifies binding pockets into 7 ligand categories using 12 Å geodesic patches.

cd /scratch/$USER/masif/cpu/masif_ligand

# 1. Preprocess proteins (CPU array job)
sbatch data_prepare.slurm

# 2. Generate TFRecords for training
sbatch make_tfrecord.slurm

# 3. Train the classifier (GPU)
sbatch train_model.slurm

# 4. Evaluate on test set (GPU)
sbatch evaluate_test.slurm

Protein lists are numpy arrays in lists/:

  • train_pdbs_sequence.npy
  • val_pdbs_sequence.npy
  • test_pdbs_sequence.npy

Output in test_set_predictions/:

  • <PDB>_<chains>_labels.npy — ground truth labels
  • <PDB>_<chains>_logits.npy — predicted logits
MaSIF-ligand Slurm script resources
ScriptResources
data_prepare.slurmCPU array, 1 core, 16 GB, 2 h/task
make_tfrecord.slurmCPU, 1 core, 8 GB, 48 h
train_model.slurm1 GPU, 1 core, 16 GB, 48 h
evaluate_test.slurm1 GPU, 1 core, 16 GB, 24 h

MaSIF-search — PPI Surface Scanning

Scans a database of protein surfaces for structural binding partners of a query patch.

cd /scratch/$USER/masif/cpu/masif_ppi_search

# 1. Preprocess proteins (CPU array job)
sbatch data_prepare.slurm

# 2. Cache training patch pairs (shape-complementarity filtered)
masif-exec bash -c "cd $PWD && ./cache_nn.sh nn_models.sc05.custom_params"

# 3. Train the descriptor network (GPU)
sbatch masif_ppi_search_train.slurm

# 4. Compute descriptors for search (GPU)
sbatch masif_ppi_search_comp_desc.slurm

# 5. Compute GIF descriptors (optional, for GIF-based search)
masif-exec bash -c "cd $PWD && ./compute_gif_descriptors.sh"
MaSIF-search Slurm script resources
ScriptResources
data_prepare.slurmCPU array, 1 core, 8 GB, 1 h/task
masif_ppi_search_train.slurm1 GPU, 1 core, 32 GB, 40 h
masif_ppi_search_comp_desc.slurm1 GPU, 1 core, 32 GB, 20 h

Unbound benchmark variant

Scripts for the unbound docking benchmark are in /scratch/$USER/masif/cpu/masif_ppi_search_ub/:

cd /scratch/$USER/masif/cpu/masif_ppi_search_ub
sbatch data_prepare.slurm # processes lists/benchmark_list_ub.txt
sbatch masif_ppi_search_comp_desc.slurm # compute descriptors

MaSIF-pdl1 Benchmark

Reproduces the PDL1 benchmark from the paper.

cd /scratch/$USER/masif/cpu/masif_pdl1_benchmark
sbatch data_prepare.slurm # CPU array, lists/full_list.txt
masif-exec bash -c "cd $PWD && ./run_benchmark_nn.sh"

MaSIF-peptides

Evaluates MaSIF-site and MaSIF-search on peptide–protein interactions.

cd /scratch/$USER/masif/cpu/masif_peptides

# Extract helix data (CPU array, lists/bc-100-list.txt)
sbatch data_extract_helix.slurm

# Precompute patches (CPU array, lists/all_peptides.txt)
sbatch data_precompute_patches.slurm

# Evaluate (CPU array, reads from in/x<task_id> split files)
sbatch masif_site_masif_search_eval.slurm

Adjusting Slurm Scripts

Every script has two lines to review before submitting:

#SBATCH --partition=short    # change to your cluster's CPU partition name
#SBATCH --partition=gpu # change to your cluster's GPU partition name
#SBATCH --array=1-1000 # adjust upper bound to match your list length

The SIF image path defaults to /packages/apps/simg/masif.sif. Override per-job:

MASIF_SIF=/other/path/masif.sif sbatch data_prepare.slurm

Or export for a batch of submissions:

export MASIF_SIF=/other/path/masif.sif
sbatch data_prepare.slurm
sbatch masif_site_train.slurm

Extra bind mounts for data outside /scratch/$USER/masif/cpu:

export MASIF_BINDS=/scratch/myproject/data:/scratch/myproject/data
sbatch data_prepare.slurm

Visualising Results in PyMOL

.ply surface files are produced by color_site.sh. To view them:

  1. Install the MaSIF PyMOL plugin on your local machine: see /packages/apps/masif/1.0/pymol_plugin_installation.md
  2. Copy the .ply file from the cluster to your machine
  3. In PyMOL:
    loadply 4ZQK_A.ply
  4. Hide all objects except those containing iface to show the predicted interaction site coloured by score.

Checking Job Status

squeue -u $USER                                              # running/pending jobs
sacct -j <jobid> --format=JobID,State,ExitCode,Elapsed # completed job summary
cat exelogs/data_prepare.<jobid>_1.out # inspect one task's log

Troubleshooting

MASIF_ROOT not set / python can't find source files

Make sure you loaded the module before submitting: module load masif. The slurm scripts fall back to /packages/apps/masif/1.0 if the variable is unset, but it's safer to have the module loaded.

Permission denied writing output files

You are running from the read-only install at /packages/apps/masif/1.0/. Run copy_data and submit jobs from /scratch/$USER/masif/cpu/ instead.

fatal: not a git repository

This warning can appear if a script's git rev-parse fallback fires. It is harmless — the scripts use MASIF_ROOT when set, which takes precedence over git.

Container can't find my data files

Files outside MASIF_ROOT and your working directory are not visible inside the container. Set MASIF_BINDS to bind your data location, or use --bind directly with apptainer exec.

GPU jobs run on CPU / TensorFlow doesn't see the GPU

TF 1.12 requires CUDA ≤10. Cluster GPUs with CUDA 11+ (e.g. A100) are not supported by this container. A GPU-compatible container will be provided separately.


Known Limitations

  • TensorFlow 1.12 only — no TF2 / eager mode. CUDA ≤10 required for GPU acceleration.
  • Preprocessing takes ~1–2 min/protein (bottleneck: MDS geodesic coordinates and APBS electrostatics). Use array jobs for large datasets.
  • Results may differ slightly from the published paper because the MATLAB preprocessing pipeline has been replaced with Python equivalents. To reproduce exact paper results, see masif_paper on GitHub.
  • DSSP is not included in the container and is not required for any of the three main applications.