Skip to main content

RC Community Datasets

Overview

There is a growing number of large public datasets that researchers rely on to conduct their work. Some of these datasets are utilized by different research groups or even different research fields, and as such, they are downloaded and hosted on the supercomputer in multiple file system locations.

To reduce the global load on the supercomputer's shared filesystem and to foster data collaboration, we are pleased to consider hosting datasets in a shared location, specifically under /data/datasets.

Below is a table detailing the current datasets hosted for public use on the system. If you are interested in contributing to this community collection, please contact us with your request by reviewing our RTO Request Help page.

Current Community Datasets

NamePathShort Description
HuggingFace/data/datasets/community/huggingfacePopular huggingface models and datasets.
ImageNet/data/datasets/community/deeplearning/imagenetImage database organized by the WordNet hierarchy ImageNet
LaSOT/data/datasets/community/compvis/LaSOT_categoriesLarge-scale Single Object Tracking (LaSOT) provides a collection of thousands of sequences with millions of frames across 70 categories. Each sequence is manually annotated LaSOT - Large-scale Single Object Tracking

Benchmark results are given here: GitHub - HengLan/LaSOT_Evaluation_Toolkit
-/data/datasets/community/compvis/OTB100Data from the benchmark evaluation of online visual tracking algorithms (see: http://cvlab.hanyang.ac.kr/tracker_benchmark)

Raw zips and resulting images are hosted.
-/data/datasets/community/compvis/TrackingNetWork-in-progress
BLAST/data/datasets/community/blastThe complete BLAST databases, currently at version 5, updated on 2025-01-31. It will be regularly updated. Please check the database names here: https://ftp.ncbi.nlm.nih.gov/blast/db/v5/

Additional Help

If you require further assistance, contact the Research Computing Team:

We also offer Educational Opportunities and Workshops.