Sanity Check Report: DSS5104 Assignment Descriptions

Verification date: 2026-03-07. Three categories checked: (1) papers and references, (2) dataset availability, (3) libraries, method names, and technical claims.

Summary

Papers/references: all 11 items verified correct (titles, authors, years, venues, arxiv IDs, bibtex entry)
Datasets: all 47 datasets are accessible. 8 require free signup (Kaggle account or academic registration). None are unavailable.
Libraries: all 18 libraries verified as maintained. Two issues found (see below).
Method names: all 13 model names verified correct.
Technical claims: all 5 factual claims verified correct.

Issues Requiring Action

ISSUE 1: Albumentations is archived (HIGH)

File: image-segmentation.qmd

The original albumentations library was archived on July 10, 2025. All development has moved to AlbumentationsX (pip install albumentationsx), which is a drop-in replacement with the same API. The assignment currently links to albumentations.ai and recommends Albumentations by name.

Fix: update the reference to note that AlbumentationsX is the actively maintained successor, or verify that the archived version still installs and works for the assignment’s purposes.

ISSUE 2: DeiT not in torchvision (MEDIUM)

File: CV-transfer-learning.qmd

DeiT is listed under “Vision Transformers” alongside ViT and Swin. The text says these are “available through libraries such as torchvision, timm, or HuggingFace.” This is technically correct (DeiT is in timm and HuggingFace), but students may expect all listed models to be in torchvision. Torchvision’s ViT uses DeiT’s training recipe but does not expose DeiT as a separate model class (no distillation token).

Fix: either remove DeiT from the list, or add a note that DeiT specifically requires timm or HuggingFace Transformers.

ISSUE 3: D-MPNN vs ChemProp (LOW)

File: molecular-prediction.qmd

The text says “D-MPNN (ChemProp)” implying they are the same. D-MPNN is the architecture; ChemProp is the software package that implements it. Minor inaccuracy, but could confuse students searching for documentation.

Fix: change to “D-MPNN (implemented in ChemProp)” for clarity.

Papers and References: Detailed Results

Citation	File	Status	Notes
Dacrema et al. (2019) “Are We Really Making Much Progress?”	recommender-systems.qmd	CORRECT	RecSys 2019. Bibtex entry in ref.bib is correct.
He et al. (2017) Neural Collaborative Filtering	recommender-systems.qmd	CORRECT	WWW 2017
Grinsztajn et al. (2022) “Why do tree-based models still outperform…”	DL-for-tabular-data.qmd	CORRECT	NeurIPS 2022 (Datasets and Benchmarks)
Gorishniy et al. (2021) “Revisiting deep learning models for tabular data”	DL-for-tabular-data.qmd	CORRECT	NeurIPS 2021
Zeng et al. (2023) “Are Transformers Effective for Time Series Forecasting?”	DL-for-time-series.qmd	CORRECT	AAAI 2023 (oral)
TabNet arxiv 1908.07442	DL-for-tabular-data.qmd	CORRECT	Arik and Pfister
FT-Transformer arxiv 2106.11959	DL-for-tabular-data.qmd	CORRECT	Same paper as Gorishniy et al. 2021
NODE arxiv 1909.06312	DL-for-tabular-data.qmd	CORRECT	Popov, Morozov, Babenko
TabTransformer arxiv 2012.06678	DL-for-tabular-data.qmd	CORRECT	Huang, Khetan, Cvitkovic, Karnin
SAINT arxiv 2106.01342	DL-for-tabular-data.qmd	CORRECT	Somepalli et al.

Dataset Availability: Detailed Results

Freely available (no signup)

Dataset	File	Source
MedMNIST variants	CV-transfer-learning	`pip install medmnist`, HuggingFace
ISIC skin lesion	CV-transfer-learning	isic-archive.com
EuroSAT	CV-transfer-learning	torchvision built-in
UC Merced Land Use	CV-transfer-learning	Official site, HuggingFace
Food-101	CV-transfer-learning	torchvision built-in
FGVC-Aircraft	CV-transfer-learning	Oxford VGG, HuggingFace
Oxford Flowers	CV-transfer-learning	torchvision built-in
CUB-200 Birds	CV-transfer-learning	Kaggle, GitHub mirrors
MovieLens 1M/100K	recommender-systems	grouplens.org, direct download
Last.fm	recommender-systems	GroupLens HetRec 2011
Amazon Reviews 2023	recommender-systems	HuggingFace (McAuley-Lab)
CICIDS2017	fraud-detection	UNB official site
IMDb	text-classification	HuggingFace, torchtext
SST-2	text-classification	HuggingFace (GLUE)
AG News	text-classification	HuggingFace, torchtext
20 Newsgroups	text-classification	sklearn built-in
Banking77	text-classification	HuggingFace
TweetEval	text-classification	HuggingFace (cardiffnlp)
Financial PhraseBank	text-classification	HuggingFace
GoEmotions	text-classification	HuggingFace
ATIS	text-classification	Kaggle, GitHub mirrors
Kvasir-SEG	image-segmentation	Simula, Kaggle
ISBI Cell Segmentation	image-segmentation	Cell Tracking Challenge
LandCover.ai	image-segmentation	Official site, Kaggle
Massachusetts Buildings/Roads	image-segmentation	Official site, Academic Torrents
Pascal VOC 2012	image-segmentation	torchvision built-in
CamVid	image-segmentation	Official site, Kaggle
Oxford-IIIT Pet	image-segmentation	torchvision built-in
MoleculeNet (BBBP, BACE)	molecular-prediction	moleculenet.org, DeepChem, PyG
OGB ogbg-molhiv	molecular-prediction	ogb Python package
California Housing	DL-for-tabular-data	sklearn built-in
Adult Income	DL-for-tabular-data	UCI ML Repository
Covertype	DL-for-tabular-data	UCI ML Repository
HIGGS	DL-for-tabular-data	UCI ML Repository (~8GB)
ETTh1/ETTh2	DL-for-time-series	GitHub (zhouhaoyi/ETDataset)
Electricity (UCI)	DL-for-time-series	UCI ML Repository
M4 competition	DL-for-time-series	GitHub (Mcompetitions)
M5 competition	DL-for-time-series	GitHub (Mcompetitions)
Weather	DL-for-time-series	Autoformer benchmark repo
Traffic	DL-for-time-series	Autoformer benchmark repo

Require free signup or agreement

Dataset	File	Requirement
CheXpert-small	CV-transfer-learning	Stanford research use agreement
IEEE-CIS Fraud Detection	fraud-detection	Kaggle account + competition rules
Credit Card Fraud	fraud-detection	Kaggle account
PaySim	fraud-detection	Kaggle account
GlaS	image-segmentation	Warwick registration for credentials
Inria Aerial Image Labeling	image-segmentation	Web form
Cityscapes	image-segmentation	Academic registration (may take days)
Porto Seguro Safe Driver	DL-for-tabular-data	Kaggle account + competition rules

Libraries: Detailed Results

Library	Status	Notes
torchvision	MAINTAINED	Includes ResNet, EfficientNet, ConvNeXt, ViT, Swin. DeiT not separate (see Issue 2).
timm	MAINTAINED	v1.0.25 (Feb 2026). 300+ architectures including DeiT.
segmentation_models_pytorch	MAINTAINED	Supports U-Net, DeepLabV3, FPN. Now under qubvel-org.
Albumentations	ARCHIVED	See Issue 1. Superseded by AlbumentationsX.
RecBole	MAINTAINED	94 algorithms including NCF, GRU4Rec, SASRec, MultVAE.
LensKit	MAINTAINED	2025.x series, v2026 in development.
Surprise	MAINTENANCE-ONLY	v1.1.3 works. No new features since 2019. Fine for teaching.
implicit	MAINTENANCE-ONLY	ALS, BPR, GPU support. Reduced activity. Works.
PyOD	MAINTAINED	PyOD 2 (2025). 50+ algorithms.
SetFit	MAINTAINED	HuggingFace. Contrastive pair generation confirmed slow at scale.
PyTorch Geometric	MAINTAINED	Supports GCN, MPNN, AttentiveFP.
DGL	MAINTAINED	v2.0.0 released. NVIDIA continues releases.
DeepChem	MAINTAINED	MoleculeNet datasets included.
statsforecast	MAINTAINED	AutoARIMA, AutoETS, AutoTheta. Part of Nixtla.
Darts	MAINTAINED	Unit8. Classical + DL models with unified API.
GluonTS	MAINTAINED	AWS Labs. v0.16.2 (June 2025).
Nixtla ecosystem	MAINTAINED	StatsForecast, NeuralForecast, MLForecast, HierarchicalForecast.
fvcore / ptflops	MAINTAINED	Both work for FLOP counting.

Method and Model Names: Detailed Results

Name	Status	Full name / paper
GRU4Rec	CORRECT	Hidasi et al., ICLR 2016
SASRec	CORRECT	Kang and McAuley, ICDM 2018
MultVAE	CORRECT	Mult-VAE^PR, Liang et al., WWW 2018
PatchTST	CORRECT	“A Time Series is Worth 64 Words”, ICLR 2023
N-BEATS	CORRECT	Neural Basis Expansion Analysis for Time Series, ICLR 2020
TiDE	CORRECT	Time-series Dense Encoder, Das et al.
DeepAR	CORRECT	Amazon autoregressive probabilistic model
DLinear	CORRECT	From Zeng et al., AAAI 2023
Deep SVDD	CORRECT	Deep Support Vector Data Description, Ruff et al., ICML 2018
AttentiveFP	CORRECT	Xiong et al., J. Med. Chem. 2020. In PyG.
FCN	CORRECT	Long, Shelhamer, Darrell, CVPR 2015
SegFormer	CORRECT	Xie et al., NeurIPS 2021, NVlabs

Technical Claims: Detailed Results

Claim	File	Status
ImageNet mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]	CV-transfer-learning	CORRECT
BERT has 512 token limit	text-classification	CORRECT
Dice coefficient = F1-score for binary segmentation	image-segmentation	CORRECT (both = 2TP/(2TP+FP+FN))
ECFP4 = Morgan fingerprint with radius 2	molecular-prediction	CORRECT (diameter 4 = radius 2)
SetFit slow at full dataset scale due to contrastive pairs	text-classification	CORRECT (combinatorial pair generation)