Sanity Check Report: DSS5104 Assignment Descriptions

Verification date: 2026-03-07. Three categories checked: (1) papers and references, (2) dataset availability, (3) libraries, method names, and technical claims.

Summary

  • Papers/references: all 11 items verified correct (titles, authors, years, venues, arxiv IDs, bibtex entry)
  • Datasets: all 47 datasets are accessible. 8 require free signup (Kaggle account or academic registration). None are unavailable.
  • Libraries: all 18 libraries verified as maintained. Two issues found (see below).
  • Method names: all 13 model names verified correct.
  • Technical claims: all 5 factual claims verified correct.

Issues Requiring Action

ISSUE 1: Albumentations is archived (HIGH)

File: image-segmentation.qmd

The original albumentations library was archived on July 10, 2025. All development has moved to AlbumentationsX (pip install albumentationsx), which is a drop-in replacement with the same API. The assignment currently links to albumentations.ai and recommends Albumentations by name.

Fix: update the reference to note that AlbumentationsX is the actively maintained successor, or verify that the archived version still installs and works for the assignment’s purposes.

ISSUE 2: DeiT not in torchvision (MEDIUM)

File: CV-transfer-learning.qmd

DeiT is listed under “Vision Transformers” alongside ViT and Swin. The text says these are “available through libraries such as torchvision, timm, or HuggingFace.” This is technically correct (DeiT is in timm and HuggingFace), but students may expect all listed models to be in torchvision. Torchvision’s ViT uses DeiT’s training recipe but does not expose DeiT as a separate model class (no distillation token).

Fix: either remove DeiT from the list, or add a note that DeiT specifically requires timm or HuggingFace Transformers.

ISSUE 3: D-MPNN vs ChemProp (LOW)

File: molecular-prediction.qmd

The text says “D-MPNN (ChemProp)” implying they are the same. D-MPNN is the architecture; ChemProp is the software package that implements it. Minor inaccuracy, but could confuse students searching for documentation.

Fix: change to “D-MPNN (implemented in ChemProp)” for clarity.

Papers and References: Detailed Results

Citation File Status Notes
Dacrema et al. (2019) “Are We Really Making Much Progress?” recommender-systems.qmd CORRECT RecSys 2019. Bibtex entry in ref.bib is correct.
He et al. (2017) Neural Collaborative Filtering recommender-systems.qmd CORRECT WWW 2017
Grinsztajn et al. (2022) “Why do tree-based models still outperform…” DL-for-tabular-data.qmd CORRECT NeurIPS 2022 (Datasets and Benchmarks)
Gorishniy et al. (2021) “Revisiting deep learning models for tabular data” DL-for-tabular-data.qmd CORRECT NeurIPS 2021
Zeng et al. (2023) “Are Transformers Effective for Time Series Forecasting?” DL-for-time-series.qmd CORRECT AAAI 2023 (oral)
TabNet arxiv 1908.07442 DL-for-tabular-data.qmd CORRECT Arik and Pfister
FT-Transformer arxiv 2106.11959 DL-for-tabular-data.qmd CORRECT Same paper as Gorishniy et al. 2021
NODE arxiv 1909.06312 DL-for-tabular-data.qmd CORRECT Popov, Morozov, Babenko
TabTransformer arxiv 2012.06678 DL-for-tabular-data.qmd CORRECT Huang, Khetan, Cvitkovic, Karnin
SAINT arxiv 2106.01342 DL-for-tabular-data.qmd CORRECT Somepalli et al.

Dataset Availability: Detailed Results

Freely available (no signup)

Dataset File Source
MedMNIST variants CV-transfer-learning pip install medmnist, HuggingFace
ISIC skin lesion CV-transfer-learning isic-archive.com
EuroSAT CV-transfer-learning torchvision built-in
UC Merced Land Use CV-transfer-learning Official site, HuggingFace
Food-101 CV-transfer-learning torchvision built-in
FGVC-Aircraft CV-transfer-learning Oxford VGG, HuggingFace
Oxford Flowers CV-transfer-learning torchvision built-in
CUB-200 Birds CV-transfer-learning Kaggle, GitHub mirrors
MovieLens 1M/100K recommender-systems grouplens.org, direct download
Last.fm recommender-systems GroupLens HetRec 2011
Amazon Reviews 2023 recommender-systems HuggingFace (McAuley-Lab)
CICIDS2017 fraud-detection UNB official site
IMDb text-classification HuggingFace, torchtext
SST-2 text-classification HuggingFace (GLUE)
AG News text-classification HuggingFace, torchtext
20 Newsgroups text-classification sklearn built-in
Banking77 text-classification HuggingFace
TweetEval text-classification HuggingFace (cardiffnlp)
Financial PhraseBank text-classification HuggingFace
GoEmotions text-classification HuggingFace
ATIS text-classification Kaggle, GitHub mirrors
Kvasir-SEG image-segmentation Simula, Kaggle
ISBI Cell Segmentation image-segmentation Cell Tracking Challenge
LandCover.ai image-segmentation Official site, Kaggle
Massachusetts Buildings/Roads image-segmentation Official site, Academic Torrents
Pascal VOC 2012 image-segmentation torchvision built-in
CamVid image-segmentation Official site, Kaggle
Oxford-IIIT Pet image-segmentation torchvision built-in
MoleculeNet (BBBP, BACE) molecular-prediction moleculenet.org, DeepChem, PyG
OGB ogbg-molhiv molecular-prediction ogb Python package
California Housing DL-for-tabular-data sklearn built-in
Adult Income DL-for-tabular-data UCI ML Repository
Covertype DL-for-tabular-data UCI ML Repository
HIGGS DL-for-tabular-data UCI ML Repository (~8GB)
ETTh1/ETTh2 DL-for-time-series GitHub (zhouhaoyi/ETDataset)
Electricity (UCI) DL-for-time-series UCI ML Repository
M4 competition DL-for-time-series GitHub (Mcompetitions)
M5 competition DL-for-time-series GitHub (Mcompetitions)
Weather DL-for-time-series Autoformer benchmark repo
Traffic DL-for-time-series Autoformer benchmark repo

Require free signup or agreement

Dataset File Requirement
CheXpert-small CV-transfer-learning Stanford research use agreement
IEEE-CIS Fraud Detection fraud-detection Kaggle account + competition rules
Credit Card Fraud fraud-detection Kaggle account
PaySim fraud-detection Kaggle account
GlaS image-segmentation Warwick registration for credentials
Inria Aerial Image Labeling image-segmentation Web form
Cityscapes image-segmentation Academic registration (may take days)
Porto Seguro Safe Driver DL-for-tabular-data Kaggle account + competition rules

Libraries: Detailed Results

Library Status Notes
torchvision MAINTAINED Includes ResNet, EfficientNet, ConvNeXt, ViT, Swin. DeiT not separate (see Issue 2).
timm MAINTAINED v1.0.25 (Feb 2026). 300+ architectures including DeiT.
segmentation_models_pytorch MAINTAINED Supports U-Net, DeepLabV3, FPN. Now under qubvel-org.
Albumentations ARCHIVED See Issue 1. Superseded by AlbumentationsX.
RecBole MAINTAINED 94 algorithms including NCF, GRU4Rec, SASRec, MultVAE.
LensKit MAINTAINED 2025.x series, v2026 in development.
Surprise MAINTENANCE-ONLY v1.1.3 works. No new features since 2019. Fine for teaching.
implicit MAINTENANCE-ONLY ALS, BPR, GPU support. Reduced activity. Works.
PyOD MAINTAINED PyOD 2 (2025). 50+ algorithms.
SetFit MAINTAINED HuggingFace. Contrastive pair generation confirmed slow at scale.
PyTorch Geometric MAINTAINED Supports GCN, MPNN, AttentiveFP.
DGL MAINTAINED v2.0.0 released. NVIDIA continues releases.
DeepChem MAINTAINED MoleculeNet datasets included.
statsforecast MAINTAINED AutoARIMA, AutoETS, AutoTheta. Part of Nixtla.
Darts MAINTAINED Unit8. Classical + DL models with unified API.
GluonTS MAINTAINED AWS Labs. v0.16.2 (June 2025).
Nixtla ecosystem MAINTAINED StatsForecast, NeuralForecast, MLForecast, HierarchicalForecast.
fvcore / ptflops MAINTAINED Both work for FLOP counting.

Method and Model Names: Detailed Results

Name Status Full name / paper
GRU4Rec CORRECT Hidasi et al., ICLR 2016
SASRec CORRECT Kang and McAuley, ICDM 2018
MultVAE CORRECT Mult-VAE^PR, Liang et al., WWW 2018
PatchTST CORRECT “A Time Series is Worth 64 Words”, ICLR 2023
N-BEATS CORRECT Neural Basis Expansion Analysis for Time Series, ICLR 2020
TiDE CORRECT Time-series Dense Encoder, Das et al.
DeepAR CORRECT Amazon autoregressive probabilistic model
DLinear CORRECT From Zeng et al., AAAI 2023
Deep SVDD CORRECT Deep Support Vector Data Description, Ruff et al., ICML 2018
AttentiveFP CORRECT Xiong et al., J. Med. Chem. 2020. In PyG.
FCN CORRECT Long, Shelhamer, Darrell, CVPR 2015
SegFormer CORRECT Xie et al., NeurIPS 2021, NVlabs

Technical Claims: Detailed Results

Claim File Status
ImageNet mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] CV-transfer-learning CORRECT
BERT has 512 token limit text-classification CORRECT
Dice coefficient = F1-score for binary segmentation image-segmentation CORRECT (both = 2TP/(2TP+FP+FN))
ECFP4 = Morgan fingerprint with radius 2 molecular-prediction CORRECT (diameter 4 = radius 2)
SetFit slow at full dataset scale due to contrastive pairs text-classification CORRECT (combinatorial pair generation)