Sanity Check Report: DSS5104 Assignment Descriptions
Verification date: 2026-03-07. Three categories checked: (1) papers and references, (2) dataset availability, (3) libraries, method names, and technical claims.
Summary
- Papers/references: all 11 items verified correct (titles, authors, years, venues, arxiv IDs, bibtex entry)
- Datasets: all 47 datasets are accessible. 8 require free signup (Kaggle account or academic registration). None are unavailable.
- Libraries: all 18 libraries verified as maintained. Two issues found (see below).
- Method names: all 13 model names verified correct.
- Technical claims: all 5 factual claims verified correct.
Issues Requiring Action
ISSUE 1: Albumentations is archived (HIGH)
File: image-segmentation.qmd
The original albumentations library was archived on July 10, 2025. All development has moved to AlbumentationsX (pip install albumentationsx), which is a drop-in replacement with the same API. The assignment currently links to albumentations.ai and recommends Albumentations by name.
Fix: update the reference to note that AlbumentationsX is the actively maintained successor, or verify that the archived version still installs and works for the assignment’s purposes.
ISSUE 2: DeiT not in torchvision (MEDIUM)
File: CV-transfer-learning.qmd
DeiT is listed under “Vision Transformers” alongside ViT and Swin. The text says these are “available through libraries such as torchvision, timm, or HuggingFace.” This is technically correct (DeiT is in timm and HuggingFace), but students may expect all listed models to be in torchvision. Torchvision’s ViT uses DeiT’s training recipe but does not expose DeiT as a separate model class (no distillation token).
Fix: either remove DeiT from the list, or add a note that DeiT specifically requires timm or HuggingFace Transformers.
ISSUE 3: D-MPNN vs ChemProp (LOW)
File: molecular-prediction.qmd
The text says “D-MPNN (ChemProp)” implying they are the same. D-MPNN is the architecture; ChemProp is the software package that implements it. Minor inaccuracy, but could confuse students searching for documentation.
Fix: change to “D-MPNN (implemented in ChemProp)” for clarity.
Papers and References: Detailed Results
| Citation | File | Status | Notes |
|---|---|---|---|
| Dacrema et al. (2019) “Are We Really Making Much Progress?” | recommender-systems.qmd | CORRECT | RecSys 2019. Bibtex entry in ref.bib is correct. |
| He et al. (2017) Neural Collaborative Filtering | recommender-systems.qmd | CORRECT | WWW 2017 |
| Grinsztajn et al. (2022) “Why do tree-based models still outperform…” | DL-for-tabular-data.qmd | CORRECT | NeurIPS 2022 (Datasets and Benchmarks) |
| Gorishniy et al. (2021) “Revisiting deep learning models for tabular data” | DL-for-tabular-data.qmd | CORRECT | NeurIPS 2021 |
| Zeng et al. (2023) “Are Transformers Effective for Time Series Forecasting?” | DL-for-time-series.qmd | CORRECT | AAAI 2023 (oral) |
| TabNet arxiv 1908.07442 | DL-for-tabular-data.qmd | CORRECT | Arik and Pfister |
| FT-Transformer arxiv 2106.11959 | DL-for-tabular-data.qmd | CORRECT | Same paper as Gorishniy et al. 2021 |
| NODE arxiv 1909.06312 | DL-for-tabular-data.qmd | CORRECT | Popov, Morozov, Babenko |
| TabTransformer arxiv 2012.06678 | DL-for-tabular-data.qmd | CORRECT | Huang, Khetan, Cvitkovic, Karnin |
| SAINT arxiv 2106.01342 | DL-for-tabular-data.qmd | CORRECT | Somepalli et al. |
Dataset Availability: Detailed Results
Freely available (no signup)
| Dataset | File | Source |
|---|---|---|
| MedMNIST variants | CV-transfer-learning | pip install medmnist, HuggingFace |
| ISIC skin lesion | CV-transfer-learning | isic-archive.com |
| EuroSAT | CV-transfer-learning | torchvision built-in |
| UC Merced Land Use | CV-transfer-learning | Official site, HuggingFace |
| Food-101 | CV-transfer-learning | torchvision built-in |
| FGVC-Aircraft | CV-transfer-learning | Oxford VGG, HuggingFace |
| Oxford Flowers | CV-transfer-learning | torchvision built-in |
| CUB-200 Birds | CV-transfer-learning | Kaggle, GitHub mirrors |
| MovieLens 1M/100K | recommender-systems | grouplens.org, direct download |
| Last.fm | recommender-systems | GroupLens HetRec 2011 |
| Amazon Reviews 2023 | recommender-systems | HuggingFace (McAuley-Lab) |
| CICIDS2017 | fraud-detection | UNB official site |
| IMDb | text-classification | HuggingFace, torchtext |
| SST-2 | text-classification | HuggingFace (GLUE) |
| AG News | text-classification | HuggingFace, torchtext |
| 20 Newsgroups | text-classification | sklearn built-in |
| Banking77 | text-classification | HuggingFace |
| TweetEval | text-classification | HuggingFace (cardiffnlp) |
| Financial PhraseBank | text-classification | HuggingFace |
| GoEmotions | text-classification | HuggingFace |
| ATIS | text-classification | Kaggle, GitHub mirrors |
| Kvasir-SEG | image-segmentation | Simula, Kaggle |
| ISBI Cell Segmentation | image-segmentation | Cell Tracking Challenge |
| LandCover.ai | image-segmentation | Official site, Kaggle |
| Massachusetts Buildings/Roads | image-segmentation | Official site, Academic Torrents |
| Pascal VOC 2012 | image-segmentation | torchvision built-in |
| CamVid | image-segmentation | Official site, Kaggle |
| Oxford-IIIT Pet | image-segmentation | torchvision built-in |
| MoleculeNet (BBBP, BACE) | molecular-prediction | moleculenet.org, DeepChem, PyG |
| OGB ogbg-molhiv | molecular-prediction | ogb Python package |
| California Housing | DL-for-tabular-data | sklearn built-in |
| Adult Income | DL-for-tabular-data | UCI ML Repository |
| Covertype | DL-for-tabular-data | UCI ML Repository |
| HIGGS | DL-for-tabular-data | UCI ML Repository (~8GB) |
| ETTh1/ETTh2 | DL-for-time-series | GitHub (zhouhaoyi/ETDataset) |
| Electricity (UCI) | DL-for-time-series | UCI ML Repository |
| M4 competition | DL-for-time-series | GitHub (Mcompetitions) |
| M5 competition | DL-for-time-series | GitHub (Mcompetitions) |
| Weather | DL-for-time-series | Autoformer benchmark repo |
| Traffic | DL-for-time-series | Autoformer benchmark repo |
Require free signup or agreement
| Dataset | File | Requirement |
|---|---|---|
| CheXpert-small | CV-transfer-learning | Stanford research use agreement |
| IEEE-CIS Fraud Detection | fraud-detection | Kaggle account + competition rules |
| Credit Card Fraud | fraud-detection | Kaggle account |
| PaySim | fraud-detection | Kaggle account |
| GlaS | image-segmentation | Warwick registration for credentials |
| Inria Aerial Image Labeling | image-segmentation | Web form |
| Cityscapes | image-segmentation | Academic registration (may take days) |
| Porto Seguro Safe Driver | DL-for-tabular-data | Kaggle account + competition rules |
Libraries: Detailed Results
| Library | Status | Notes |
|---|---|---|
| torchvision | MAINTAINED | Includes ResNet, EfficientNet, ConvNeXt, ViT, Swin. DeiT not separate (see Issue 2). |
| timm | MAINTAINED | v1.0.25 (Feb 2026). 300+ architectures including DeiT. |
| segmentation_models_pytorch | MAINTAINED | Supports U-Net, DeepLabV3, FPN. Now under qubvel-org. |
| Albumentations | ARCHIVED | See Issue 1. Superseded by AlbumentationsX. |
| RecBole | MAINTAINED | 94 algorithms including NCF, GRU4Rec, SASRec, MultVAE. |
| LensKit | MAINTAINED | 2025.x series, v2026 in development. |
| Surprise | MAINTENANCE-ONLY | v1.1.3 works. No new features since 2019. Fine for teaching. |
| implicit | MAINTENANCE-ONLY | ALS, BPR, GPU support. Reduced activity. Works. |
| PyOD | MAINTAINED | PyOD 2 (2025). 50+ algorithms. |
| SetFit | MAINTAINED | HuggingFace. Contrastive pair generation confirmed slow at scale. |
| PyTorch Geometric | MAINTAINED | Supports GCN, MPNN, AttentiveFP. |
| DGL | MAINTAINED | v2.0.0 released. NVIDIA continues releases. |
| DeepChem | MAINTAINED | MoleculeNet datasets included. |
| statsforecast | MAINTAINED | AutoARIMA, AutoETS, AutoTheta. Part of Nixtla. |
| Darts | MAINTAINED | Unit8. Classical + DL models with unified API. |
| GluonTS | MAINTAINED | AWS Labs. v0.16.2 (June 2025). |
| Nixtla ecosystem | MAINTAINED | StatsForecast, NeuralForecast, MLForecast, HierarchicalForecast. |
| fvcore / ptflops | MAINTAINED | Both work for FLOP counting. |
Method and Model Names: Detailed Results
| Name | Status | Full name / paper |
|---|---|---|
| GRU4Rec | CORRECT | Hidasi et al., ICLR 2016 |
| SASRec | CORRECT | Kang and McAuley, ICDM 2018 |
| MultVAE | CORRECT | Mult-VAE^PR, Liang et al., WWW 2018 |
| PatchTST | CORRECT | “A Time Series is Worth 64 Words”, ICLR 2023 |
| N-BEATS | CORRECT | Neural Basis Expansion Analysis for Time Series, ICLR 2020 |
| TiDE | CORRECT | Time-series Dense Encoder, Das et al. |
| DeepAR | CORRECT | Amazon autoregressive probabilistic model |
| DLinear | CORRECT | From Zeng et al., AAAI 2023 |
| Deep SVDD | CORRECT | Deep Support Vector Data Description, Ruff et al., ICML 2018 |
| AttentiveFP | CORRECT | Xiong et al., J. Med. Chem. 2020. In PyG. |
| FCN | CORRECT | Long, Shelhamer, Darrell, CVPR 2015 |
| SegFormer | CORRECT | Xie et al., NeurIPS 2021, NVlabs |
Technical Claims: Detailed Results
| Claim | File | Status |
|---|---|---|
| ImageNet mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | CV-transfer-learning | CORRECT |
| BERT has 512 token limit | text-classification | CORRECT |
| Dice coefficient = F1-score for binary segmentation | image-segmentation | CORRECT (both = 2TP/(2TP+FP+FN)) |
| ECFP4 = Morgan fingerprint with radius 2 | molecular-prediction | CORRECT (diameter 4 = radius 2) |
| SetFit slow at full dataset scale due to contrastive pairs | text-classification | CORRECT (combinatorial pair generation) |