SmellNet: A Large-scale Dataset for Real-world Smell Recognition

SmellNet is the first large-scale open-source dataset that captures how real-world substances smell, digitized using portable gas and chemical sensors. It includes 50 hours of data from 50 substances (nuts, spices, herbs, fruits, and vegetables), totaling over 180,000 time steps of multichannel sensor data, accompanied by chemical composition (GC-MS) and textual descriptions.

SmellNet enables research into:

🧠 Real-time substance classification with supervised learning
🔁 Cross-modal learning with sensor + GC-MS alignment
📈 Time-series modeling using LSTMs and Transformers
📊 Signal preprocessing like first-order temporal difference (FOTD)

📂 Dataset Access

The full dataset is hosted on Hugging Face: 👉 SmellNet on Hugging Face

Each ingredient has multiple time-series recordings in CSV format, plus paired metadata and chemical information to support multimodal learning tasks.

SmellNet is the first large-scale database that digitizes a diverse range of smells in the natural world. SmellNet enables various AI models to make substance prediction like supervised learning, contrastive learning and more to explore!

🧪 Applications

SmellNet is designed to support machine learning for:

Allergen detection (e.g., peanut traces)
Food and beverage quality control
Digital olfaction and human-AI interaction
Health diagnostics (e.g., stress, hormones, early disease)

🔗 Resources

Paper

📂 Folder Structure

offline_training: Contains folders of smell data (CSV) for training; each folder represents a substance.
offline_testing: Contains folders of smell data (CSV) for testing; each folder represents a substance.
online_nuts: Contains smell data (CSV) of nuts for testing; data collected from a different timeframe than offline data.
online_spices: Contains smell data (CSV) of spices for testing; data collected from a different timeframe than offline data.
text_description.json: Text descriptions of all substances, generated by a large language model.
gcms_dataframe.csv: High-resolution GC-MS data paired with each substance.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Arduino		Arduino
analysis		analysis
data		data
data_collection		data_collection
data_stats		data_stats
gcms_analysis		gcms_analysis
logs		logs
models		models
neurips-data-processed		neurips-data-processed
preprocessing		preprocessing
real_time_testing_nut		real_time_testing_nut
real_time_testing_spice		real_time_testing_spice
saved_models		saved_models
src		src
testing		testing
training		training
README.md		README.md
clip_text_embeddings.npy		clip_text_embeddings.npy
encode_text_description.py		encode_text_description.py
full_gcms_dataframe.csv		full_gcms_dataframe.csv
processed_full_gcms_dataframe.csv		processed_full_gcms_dataframe.csv
run_experiments.sh		run_experiments.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

📂 Dataset Access

🧪 Applications

🔗 Resources

📂 Folder Structure

About

Uh oh!

Releases

Packages

Languages

MIT-MI/SmellNet

Folders and files

Latest commit

History

Repository files navigation

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

📂 Dataset Access

🧪 Applications

🔗 Resources

📂 Folder Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages