MaSIF on the HPC Cluster
MaSIF (Molecular Surface Interaction Fingerprints) predicts protein interaction patterns using geometric deep learning on molecular surfaces.
| Application | Task | Output |
|---|---|---|
| MaSIF-site | Predict protein–protein interaction sites | Per-vertex interaction probability |
| MaSIF-ligand | Classify ligand binding pocket type | Ligand class (7 types) |
| MaSIF-search | Scan for structural binding partners | Ranked binding configurations |
Reference: Gainza et al., Nature Methods 17, 184–192 (2020).
Getting Started
1. Load the module
module load masif/1.0
This sets MASIF_ROOT and MASIF_SIF, and provides the masif-exec, masif-shell,
masif_data, and copy_data shell functions.
2. Copy the working files to scratch
The installed scripts and data under /packages/apps/masif/1.0/ are read-only.
Use the copy_data alias to get a personal writable copy on scratch:
copy_data
This copies everything to /scratch/$USER/masif/cpu/ and changes into that directory. You only need
to do this once. After that, run all jobs from your copy.
The data directories contain only scripts, lists, and pre-trained model
weights — not preprocessed surface data. Preprocessing writes to data_preparation/
inside each application directory. A full dataset (all proteins from the paper) requires
~400 GB. Plan accordingly.
3. Bind scratch into the container
Apptainer does not automatically bind /scratch. Set MASIF_BINDS before running any
commands or submitting jobs — add this to your ~/.bashrc so it persists across sessions:
echo 'export MASIF_BINDS=/scratch/$USER/masif/cpu:/scratch/$USER/masif/cpu' >> ~/.bashrc
source ~/.bashrc
All Slurm scripts read MASIF_BINDS automatically — no changes needed in the scripts.
If your data lives elsewhere, point MASIF_BINDS to that path instead.
4. (Optional) Browse the read-only install
masif_data # cd to /packages/apps/masif/1.0/userdata
Interactive Use
Two shell functions are available for running commands inside the MaSIF container:
# Run a single command
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"
# Open an interactive shell
masif-shell
Both automatically bind MASIF_ROOT into the container. To bind additional paths
(e.g. a project directory outside your scratch working directory):
export MASIF_BINDS=/scratch/myproject:/scratch/myproject
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"
MaSIF-site — PPI Site Prediction
Predicts which surface residues are likely to participate in protein–protein interactions.
Quick start — single protein
- Interactive
- Slurm Batch
cd /scratch/$USER/masif/cpu/masif_site
# Preprocess: download PDB, compute surface mesh + electrostatics + patches (~1–2 min)
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh 4ZQK_A"
# Predict interaction sites
masif-exec bash -c "cd $PWD && ./predict_site.sh 4ZQK_A"
# Colour surface by predicted score (writes a .ply file)
masif-exec bash -c "cd $PWD && ./color_site.sh 4ZQK_A"
cd /scratch/$USER/masif/cpu/masif_site
# 1. Preprocess all proteins in lists/full_list.txt (CPU array job)
sbatch data_prepare.slurm
# 2. Train the neural network (GPU, ~40 h for full dataset)
sbatch masif_site_train.slurm
# 3. Evaluate on benchmark set (GPU)
sbatch masif_site_eval.slurm
# 4. Predict on a custom list (CPU array job)
sbatch predict_site.slurm
The pre-trained model weights are already included in nn_models/all_feat_3l/model_data/ —
you can skip training and run predictions directly.
Output:
output/all_feat_3l/pred_data/pred_4ZQK_A.npy— per-vertex scoresoutput/all_feat_3l/pred_surfaces/4ZQK_A.ply— coloured surface mesh (open in PyMOL)
Using your own PDB file
masif-exec bash -c "cd $PWD && ./data_prepare_one.sh --file /path/to/protein.pdb 4ZQK_A"
Multi-chain input
For a single chain: 4ZQK_A
For a complex where chains A and B interact: 1AKJ_A_B
Slurm resource reference
MaSIF-site Slurm script resources
| Script | Resources | Input | Logs |
|---|---|---|---|
data_prepare.slurm | CPU array, 2 cores, 16 GB, 3 h/task | lists/full_list.txt | exelogs/data_prepare.<jobid>_<taskid>.{out,err} |
masif_site_train.slurm | 1 GPU, 4 cores, 32 GB, 40 h | preprocessed patches | exelogs/masif_site_train.<jobid>.{out,err} |
masif_site_eval.slurm | 1 GPU, 4 cores, 32 GB, 40 h | trained model | exelogs/masif_site_eval.<jobid>.{out,err} |
predict_site.slurm | CPU array, 2 cores, 16 GB, 3 h/task | lists/full_list.txt | exelogs/predict_site.<jobid>_<taskid>.{out,err} |
Adjust #SBATCH --partition= and #SBATCH --array= to match your protein list length
and cluster partition names before submitting.
The current container uses TensorFlow 1.12, which requires CUDA ≤10. A100/Ampere GPUs (CUDA 11+) are not compatible. A GPU-enabled container will be provided separately. Training and evaluation will fall back to CPU in the meantime, but will be significantly slower.
MaSIF-ligand — Ligand Pocket Classification
Classifies binding pockets into 7 ligand categories using 12 Å geodesic patches.
cd /scratch/$USER/masif/cpu/masif_ligand
# 1. Preprocess proteins (CPU array job)
sbatch data_prepare.slurm
# 2. Generate TFRecords for training
sbatch make_tfrecord.slurm
# 3. Train the classifier (GPU)
sbatch train_model.slurm
# 4. Evaluate on test set (GPU)
sbatch evaluate_test.slurm
Protein lists are numpy arrays in lists/:
train_pdbs_sequence.npyval_pdbs_sequence.npytest_pdbs_sequence.npy
Output in test_set_predictions/:
<PDB>_<chains>_labels.npy— ground truth labels<PDB>_<chains>_logits.npy— predicted logits
MaSIF-ligand Slurm script resources
| Script | Resources |
|---|---|
data_prepare.slurm | CPU array, 1 core, 16 GB, 2 h/task |
make_tfrecord.slurm | CPU, 1 core, 8 GB, 48 h |
train_model.slurm | 1 GPU, 1 core, 16 GB, 48 h |
evaluate_test.slurm | 1 GPU, 1 core, 16 GB, 24 h |
MaSIF-search — PPI Surface Scanning
Scans a database of protein surfaces for structural binding partners of a query patch.
cd /scratch/$USER/masif/cpu/masif_ppi_search
# 1. Preprocess proteins (CPU array job)
sbatch data_prepare.slurm
# 2. Cache training patch pairs (shape-complementarity filtered)
masif-exec bash -c "cd $PWD && ./cache_nn.sh nn_models.sc05.custom_params"
# 3. Train the descriptor network (GPU)
sbatch masif_ppi_search_train.slurm
# 4. Compute descriptors for search (GPU)
sbatch masif_ppi_search_comp_desc.slurm
# 5. Compute GIF descriptors (optional, for GIF-based search)
masif-exec bash -c "cd $PWD && ./compute_gif_descriptors.sh"
MaSIF-search Slurm script resources
| Script | Resources |
|---|---|
data_prepare.slurm | CPU array, 1 core, 8 GB, 1 h/task |
masif_ppi_search_train.slurm | 1 GPU, 1 core, 32 GB, 40 h |
masif_ppi_search_comp_desc.slurm | 1 GPU, 1 core, 32 GB, 20 h |
Unbound benchmark variant
Scripts for the unbound docking benchmark are in /scratch/$USER/masif/cpu/masif_ppi_search_ub/:
cd /scratch/$USER/masif/cpu/masif_ppi_search_ub
sbatch data_prepare.slurm # processes lists/benchmark_list_ub.txt
sbatch masif_ppi_search_comp_desc.slurm # compute descriptors
MaSIF-pdl1 Benchmark
Reproduces the PDL1 benchmark from the paper.
cd /scratch/$USER/masif/cpu/masif_pdl1_benchmark
sbatch data_prepare.slurm # CPU array, lists/full_list.txt
masif-exec bash -c "cd $PWD && ./run_benchmark_nn.sh"
MaSIF-peptides
Evaluates MaSIF-site and MaSIF-search on peptide–protein interactions.
cd /scratch/$USER/masif/cpu/masif_peptides
# Extract helix data (CPU array, lists/bc-100-list.txt)
sbatch data_extract_helix.slurm
# Precompute patches (CPU array, lists/all_peptides.txt)
sbatch data_precompute_patches.slurm
# Evaluate (CPU array, reads from in/x<task_id> split files)
sbatch masif_site_masif_search_eval.slurm
Adjusting Slurm Scripts
Every script has two lines to review before submitting:
#SBATCH --partition=short # change to your cluster's CPU partition name
#SBATCH --partition=gpu # change to your cluster's GPU partition name
#SBATCH --array=1-1000 # adjust upper bound to match your list length
The SIF image path defaults to /packages/apps/simg/masif.sif. Override per-job:
MASIF_SIF=/other/path/masif.sif sbatch data_prepare.slurm
Or export for a batch of submissions:
export MASIF_SIF=/other/path/masif.sif
sbatch data_prepare.slurm
sbatch masif_site_train.slurm
Extra bind mounts for data outside /scratch/$USER/masif/cpu:
export MASIF_BINDS=/scratch/myproject/data:/scratch/myproject/data
sbatch data_prepare.slurm
Visualising Results in PyMOL
.ply surface files are produced by color_site.sh. To view them:
- Install the MaSIF PyMOL plugin on your local machine:
see
/packages/apps/masif/1.0/pymol_plugin_installation.md - Copy the
.plyfile from the cluster to your machine - In PyMOL:
loadply 4ZQK_A.ply - Hide all objects except those containing
ifaceto show the predicted interaction site coloured by score.
Checking Job Status
squeue -u $USER # running/pending jobs
sacct -j <jobid> --format=JobID,State,ExitCode,Elapsed # completed job summary
cat exelogs/data_prepare.<jobid>_1.out # inspect one task's log
Troubleshooting
MASIF_ROOT not set / python can't find source filesMake sure you loaded the module before submitting: module load masif. The slurm scripts
fall back to /packages/apps/masif/1.0 if the variable is unset, but it's safer to have
the module loaded.
You are running from the read-only install at /packages/apps/masif/1.0/. Run copy_data
and submit jobs from /scratch/$USER/masif/cpu/ instead.
fatal: not a git repositoryThis warning can appear if a script's git rev-parse fallback fires. It is harmless —
the scripts use MASIF_ROOT when set, which takes precedence over git.
Files outside MASIF_ROOT and your working directory are not visible inside the container.
Set MASIF_BINDS to bind your data location, or use --bind directly with apptainer exec.
TF 1.12 requires CUDA ≤10. Cluster GPUs with CUDA 11+ (e.g. A100) are not supported by this container. A GPU-compatible container will be provided separately.
Known Limitations
- TensorFlow 1.12 only — no TF2 / eager mode. CUDA ≤10 required for GPU acceleration.
- Preprocessing takes ~1–2 min/protein (bottleneck: MDS geodesic coordinates and APBS electrostatics). Use array jobs for large datasets.
- Results may differ slightly from the published paper because the MATLAB preprocessing pipeline has been replaced with Python equivalents. To reproduce exact paper results, see masif_paper on GitHub.
- DSSP is not included in the container and is not required for any of the three main applications.