This repository implements machine learning kernel methods from scratch for an image classification Kaggle data challenge.
We use uv for minimal and reproducible dependency management. This solution complies with strict constraints restricting the use of external machine learning libraries.
-
Install Dependencies
uv sync
-
Kaggle Authentication Ensure your Kaggle API token is appropriately configured at
~/.kaggle/kaggle.json. -
Code Formatting and Linting Install the pre-commit hooks to ensure automated linting (Python using Ruff and Markdown using mdformat) before every commit:
uv run pre-commit install
Ensure the dataset structure mirrors the following in the data/ directory before running: Xtr.csv, Xte.csv, Ytr.csv.
To download the required data directly via Kaggle CLI within the environment:
uv run kaggle competitions download -c data-challenge-kernel-methods-2025-2026 -p data/
cd data && unzip data-challenge-kernel-methods-2025-2026.zip && rm data-challenge-kernel-methods-2025-2026.zip && cd ..The repository now uses one config-driven path for both local evaluation and Kaggle submission export. Submission files are written with the required Id and Prediction columns.
To train one validated config on the full training set and produce Yte.csv, use the evaluation script in full-data mode:
uv run python scripts/evaluate.py \
--config configs/experiments/hog_scat_j3_ckn_cov_concat.yaml \
--full-data \
--output data/Yte.csvThe start script remains available as a thin wrapper around the same full-data path:
uv run start --config configs/experiments/hog_scat_j3_ckn_cov_concat.yamlTo submit the generated solution directly to Kaggle via the CLI:
uv run kaggle competitions submit -c data-challenge-kernel-methods-2025-2026 -f data/Yte.csv -m "Message describing the submission logic"Ensure robust coverage. To run the full suite:
uv run pytest --cov=kernel_methods tests/To iteratively solve and evaluate specific kernels locally without burning Kaggle submissions, use the evaluate.py script. Experiments are configured as separate .yaml files in the configs/experiments/ directory.
To evaluate one or multiple specific experiments, you can either:
# Evaluate the default experiment (linear)
make evaluate
# Evaluate multiple specific experiments by name
make evaluate EXPS="linear rbf"
# Or natively via python
uv run python scripts/evaluate.py --experiments linear rbfAugmentation is now resolved from the experiment config itself, with optional CLI overrides. For example, the current best config declares flip augmentation in YAML, and you can temporarily override it for a comparison run:
uv run python scripts/evaluate.py \
--config configs/experiments/hog_scat_j3_ckn_cov_concat.yaml \
--no-augmentTo estimate a config locally and then export a submission from the same entrypoint:
# local validation
uv run python scripts/evaluate.py \
--config configs/experiments/hog_scat_j3_ckn_cov_concat.yaml
# full-data retrain + export
uv run python scripts/evaluate.py \
--config configs/experiments/hog_scat_j3_ckn_cov_concat.yaml \
--full-data \
--output data/Yte.csvTo explore the new scalable lane that avoids dense Gram-matrix training, run the RFF-based configs on top of explicit handcrafted feature stacks:
# scalable handcrafted baseline
uv run python scripts/evaluate.py \
--config configs/experiments/rff_hog_scat_cov.yaml
# scalable handcrafted + CKN feature stack
uv run python scripts/evaluate.py \
--config configs/experiments/rff_hog_scat_cov_ckn.yamlThese configs use primal ridge on Random Fourier Features rather than dense kernel ridge, which makes them the intended path for testing whether more augmentation and larger effective sample counts unlock additional score.
There is also an explicit experimental ZCA branch for raw-pixel RFF work:
uv run python scripts/evaluate.py \
--config configs/experiments/rff_raw_zca.yamlThis branch is intentionally isolated. It is not part of the main HOG, scattering, covariance, or CKN pipeline and should be treated as a side experiment rather than a default preprocessing step.
@misc{data-challenge-kernel-methods-2025-2026,
author = {Michael Arbel},
title = {Data Challenge - Kernel methods (2025-2026)},
year = {2026},
howpublished = {\url{https://kaggle.com/competitions/data-challenge-kernel-methods-2025-2026}},
note = {Kaggle}
}The citation above corresponds to the Kaggle competition used for this project.