|
1 | 1 | # Spatial Awareness through Ambient Wireless Signals |
2 | 2 |
|
3 | 3 | WiFi Channel State Information (CSI) can double as a privacy-preserving motion sensor. |
4 | | -This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, CLI utilities, and a reproducible notebook. The code sits in the `_archive/` directory, while the root keeps the student-facing assets (notebook, setup helpers, trained models). |
| 4 | +This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, an interactive dashboard, and a reproducible notebook. |
5 | 5 |
|
6 | 6 | ## Highlights |
7 | | -- Parses raw 802.11n CSI traces from the WiAR dataset with `csiread` |
8 | | -- Generates fixed-length CSI windows, extracts 14 statistical features, and fuses them into a binary activity dataset |
9 | | -- Trains and tunes a Random Forest presence detector, saving joblib artifacts plus metrics |
10 | | -- Provides live/recorded visualizations (heatmaps, probability curves) and HTML exports for presentations |
11 | | -- Includes a single notebook that walks through data loading, synthetic empty-room generation, training, and evaluation |
| 7 | +- **Two Models**: |
| 8 | + - **Random Forest**: Fast, feature-based (96% accuracy). |
| 9 | + - **1D-CNN (Deep Learning)**: End-to-end learning on raw CSI (91%+ accuracy). |
| 10 | +- **Interactive Dashboard**: A Streamlit app for live demonstrations, featuring real-time simulation and model switching. |
| 11 | +- **End-to-End Notebook**: `Spatial_Awareness_Project.ipynb` walks through the entire pipeline (Data Loading -> Preprocessing -> Training -> Evaluation). |
| 12 | +- **Synthetic Data**: Generates "Empty Room" samples to handle class imbalance. |
12 | 13 |
|
13 | 14 | ## Repository Tour |
14 | 15 | ``` |
15 | 16 | . |
16 | | -├── Spatial_Awareness_Project.ipynb # End-to-end, commented walkthrough |
17 | | -├── _archive/ # Source modules and CLI utilities |
18 | | -│ ├── scripts/ # Data prep + pipeline CLIs |
19 | | -│ ├── model_tools/ # Training + visualization scripts |
20 | | -│ └── src/ # Library code (preprocess/models/train) |
21 | | -├── models/ # Saved Random Forest + scaler + pipeline |
| 17 | +├── Spatial_Awareness_Project.ipynb # Main Project Notebook (Report) |
| 18 | +├── app.py # Interactive Streamlit Dashboard |
| 19 | +├── models/ # Saved Models |
| 20 | +│ ├── presence_detector_rf.joblib # Random Forest Model |
| 21 | +│ ├── presence_detector_scaler.joblib # Scaler for RF |
| 22 | +│ └── presence_detector_cnn.pth # CNN Model (PyTorch) |
| 23 | +├── model_tools/ # Model Utilities |
| 24 | +│ ├── train_cnn.py # Script to train/retrain the CNN |
| 25 | +│ └── predict_from_raw.py # CLI tool for single-file prediction |
| 26 | +├── data/ # Dataset (Git-ignored) |
22 | 27 | ├── requirements.txt # Python dependencies |
23 | | -├── setup.sh # Student-friendly environment bootstrap |
24 | | -├── Makefile # Convenience targets (setup/install/clean) |
25 | | -└── pyproject.toml # Packaging metadata (setuptools) |
| 28 | +└── _archive/ # Legacy/Helper scripts |
26 | 29 | ``` |
27 | 30 |
|
28 | | -> The `data/` directory (raw WiAR captures and processed artifacts) is git-ignored. Create the structure described below before running the pipeline. |
29 | | -
|
30 | | -### Key Components |
31 | | -- `Spatial_Awareness_Project.ipynb`: Runs the full workflow in one place—loading CSI, preprocessing, generating synthetic no-activity samples, training the model, plotting metrics, and exporting artifacts. |
32 | | -- `_archive/scripts/`: Small CLIs for dataset download (`fetch_wiar.sh`), window generation, feature extraction, binary fusion, validation, and a `run_pipeline.py` orchestrator. |
33 | | -- `_archive/model_tools/`: Training + visualization entrypoints (`train_presence_detector.py`, `tune_presence_detector.py`, `visualize_activity_heatmap.py`, `visualize_samples.py`, `visualize_live_session.py`, `live_predict.py`, `predict_from_raw.py`, `view_data.py`). `model_tools/html/` stores executed notebook exports for quick demos. |
34 | | -- `_archive/src/`: Reusable modules |
35 | | - - `preprocess/`: CSI loaders (`csi_loader.py`, `dat_loader.py`), windowing + normalization (`preprocess.py`), feature engineering (`features.py`), and WiAR inspection helpers. |
36 | | - - `models/motion_detector.py`: Runtime convenience wrapper that loads the scaler + model + metadata to score new CSI windows. |
37 | | - - `train/dataset.py`: PyTorch Dataset scaffold intended for future CNN/RNN work. |
38 | | -- `models/`: `presence_detector_rf.joblib`, `presence_detector_scaler.joblib`, and `presence_detector_pipeline.joblib` generated by the training scripts. |
39 | | -- `setup.sh` / `Makefile`: Lightweight automation to create a virtualenv and install requirements without digging into tooling details. |
40 | | - |
41 | 31 | ## Getting Started |
42 | | -### Prerequisites |
43 | | -- Python 3.10+ (3.11 works best) |
44 | | -- `pip`, `venv`, and (for `.dat` parsing) `libpcap` headers if you plan to compile `csiread` |
45 | | -- Optional: GNU Make, tmux, and JupyterLab |
46 | 32 |
|
47 | | -### Option A — one-shot setup |
| 33 | +### 1. Setup Environment |
48 | 34 | ```bash |
49 | | -cd spatialaw |
50 | | -chmod +x setup.sh |
51 | | -./setup.sh |
| 35 | +# Create virtual environment |
| 36 | +python3 -m venv .venv |
| 37 | +source .venv/bin/activate |
| 38 | + |
| 39 | +# Install dependencies |
| 40 | +pip install -r requirements.txt |
52 | 41 | ``` |
53 | 42 |
|
54 | | -### Option B — manual steps |
| 43 | +### 2. Run the Dashboard (Demo) |
| 44 | +The dashboard allows you to visualize the system in action. |
55 | 45 | ```bash |
56 | | -python3 -m venv venv |
57 | | -source venv/bin/activate |
58 | | -pip install --upgrade pip |
59 | | -pip install -r requirements.txt |
| 46 | +streamlit run app.py |
60 | 47 | ``` |
| 48 | +**Features:** |
| 49 | +- **Model Selector**: Switch between Random Forest and CNN. |
| 50 | +- **Simulate Live Mode**: Replays a file as if it were a live stream. |
| 51 | +- **Stability Filter**: Smooths predictions over time. |
61 | 52 |
|
62 | | -You can achieve the same with `make setup`, and rerun `make clean` to drop stray `__pycache__` or `.pyc` files. |
| 53 | +### 3. Run the Notebook |
| 54 | +Open `Spatial_Awareness_Project.ipynb` in Jupyter to see the full training and evaluation report. |
63 | 55 |
|
64 | | -## Data Layout & Requirements |
65 | | -All data lives under `data/` (ignored by git). Create the following folders before running the scripts: |
| 56 | +### 4. Train the CNN (Optional) |
| 57 | +If you want to retrain the Deep Learning model: |
| 58 | +```bash |
| 59 | +python model_tools/train_cnn.py |
| 60 | +``` |
| 61 | +This will train the model on `data/processed/windows` and save it to `models/presence_detector_cnn.pth`. |
| 62 | + |
| 63 | +## Data Layout |
| 64 | +All data lives under `data/` (ignored by git). |
66 | 65 | ``` |
67 | 66 | data/ |
68 | | -├── raw/ |
69 | | -│ └── WiAR/ # WiAR repository clone or downloaded archive |
70 | | -└── processed/ |
71 | | - ├── windows/ |
72 | | - ├── features/ |
73 | | - ├── binary/ |
74 | | - └── synthetic_empty/ |
| 67 | +├── raw/WiAR/ # Original Dataset |
| 68 | +└── processed/ # Generated Windows & Features |
75 | 69 | ``` |
76 | 70 |
|
77 | | -**Dataset:** WiAR (16 motion classes captured with an Intel 5300 NIC). Use `_archive/scripts/fetch_wiar.sh` to download and unpack the official release into `data/raw/WiAR`. |
78 | | - |
79 | | -## Processing & Training Pipeline |
80 | | -Each stage is a CLI that can be run independently or via `_archive/scripts/run_pipeline.py`. |
81 | | - |
82 | | -1. **Generate CSI windows** |
83 | | - ```bash |
84 | | - python _archive/scripts/generate_windows.py \ |
85 | | - --input-dir data/raw/WiAR \ |
86 | | - --out-dir data/processed/windows \ |
87 | | - --T 256 \ |
88 | | - --stride 64 |
89 | | - ``` |
90 | | - - Loads `.dat`, `.txt`, `.csv`, or `.npy` CSI files |
91 | | - - Applies denoising + z-score normalization (`preprocess.py`) |
92 | | - - Saves `window_*.npy`, `labels.csv`, and `window_generation_summary.json` |
93 | | - |
94 | | -2. **Extract 14 CSI features per window** |
95 | | - ```bash |
96 | | - python _archive/scripts/extract_features.py \ |
97 | | - --windows-dir data/processed/windows \ |
98 | | - --output-dir data/processed/features |
99 | | - ``` |
100 | | - - Features include variance, entropy, Hilbert envelope stats, motion period, MAD, etc. (see `_archive/src/preprocess/features.py`) |
101 | | - |
102 | | -3. **Create the binary presence dataset** |
103 | | - ```bash |
104 | | - python _archive/scripts/process_binary_dataset.py \ |
105 | | - --features-dir data/processed/features \ |
106 | | - --output-dir data/processed/binary |
107 | | - ``` |
108 | | - - Combines active WiAR activities into label `1` |
109 | | - - Selects low-motion segments as label `0` (with motion quantile filters) |
110 | | - - Writes `features.npy`, `labels.csv`, and `feature_names.json` |
111 | | - |
112 | | -4. **(Optional) Generate synthetic “empty room” samples** |
113 | | - ```bash |
114 | | - python _archive/scripts/generate_synthetic_empty.py \ |
115 | | - --output-dir data/processed/synthetic_empty \ |
116 | | - --n-samples 500 |
117 | | - ``` |
118 | | - - Matches the notebook’s synthetic-noise generator to balance classes when real idle captures are scarce. |
119 | | - |
120 | | -5. **Validate the binary dataset** |
121 | | - ```bash |
122 | | - python _archive/scripts/validate_binary_dataset.py \ |
123 | | - --binary-dir data/processed/binary |
124 | | - ``` |
125 | | - - Produces `validation_report.json` (label histograms, motion percentiles, leakage checks). |
126 | | - |
127 | | -6. **Run the whole pipeline (any subset of steps)** |
128 | | - ```bash |
129 | | - python _archive/scripts/run_pipeline.py --steps windows features binary validate |
130 | | - ``` |
131 | | - |
132 | | -## Modeling & Visualization Tools |
133 | | -Located in `_archive/model_tools/`: |
134 | | - |
135 | | -| Script | Purpose | Outputs | |
136 | | -| --- | --- | --- | |
137 | | -| `train_presence_detector.py` | Train + evaluate RandomForestClassifier on the binary dataset; optionally reloads existing scaler/model. | Saves `.joblib` models, `presence_detector_metrics.json`, confusion matrix, ROC curve, feature importance plots. | |
138 | | -| `tune_presence_detector.py` | Grid search with GroupKFold cross-validation. | `models/tuning_results.json`, best model artifacts. | |
139 | | -| `visualize_activity_heatmap.py` | Draws CSI heatmaps with predicted probabilities overlayed (great for demos). | PNG/interactive matplotlib windows. | |
140 | | -| `visualize_samples.py` | Random window explorer for debugging feature quality. | Matplotlib figures. | |
141 | | -| `visualize_live_session.py` | Streams pre-recorded sessions as if live, scoring each window. | Live matplotlib updates. | |
142 | | -| `live_predict.py` | Connects to an incoming CSI stream (or directory) and prints rolling predictions. | Terminal output + optional plots. | |
143 | | -| `predict_from_raw.py` | Convenience wrapper to score a single raw CSI file end-to-end. | Prints probability + label. | |
144 | | -| `view_data.py` | Dumps `.npy` feature data, label counts, and metadata. | Terminal table summaries. | |
145 | | - |
146 | | -Executed HTML exports of the major scripts live in `_archive/model_tools/html/` so you can skim outputs without running the code. |
147 | | - |
148 | | -## Notebook Walkthrough (`Spatial_Awareness_Project.ipynb`) |
149 | | -The notebook mirrors the CLI pipeline but keeps everything in one place for reports: |
150 | | -- **Sections 1–2**: Data loading utilities for Intel 5300 `.dat` files, including a fallback parser and visual sanity checks (packet counts, amplitude heatmaps). |
151 | | -- **Sections 3–4**: Preprocessing helpers (Butterworth filters, Hilbert envelope, z-score normalization) and window visualization. |
152 | | -- **Section 5**: Feature extraction (the same 14 statistical descriptors used by the CLIs) with pandas summaries. |
153 | | -- **Section 6**: Synthetic “empty room” generator + SMOTE oversampling to handle class imbalance when genuine idle captures are missing. |
154 | | -- **Section 7**: Train/test split with `GroupShuffleSplit`, Random Forest training (balanced class weights), metrics (accuracy, precision, recall, F1, ROC-AUC), confusion matrix, ROC curve, and distribution plots. |
155 | | -- **Section 8**: Model export via `joblib`, along with helper functions to reload the scaler/model for downstream scripts. |
156 | | -- **Appendix**: Utility cells for plotting CSI heatmaps, inspecting feature importances, and sandboxing experimental architectures (there is an `add_cnn_to_notebook.py` helper in `_archive/scripts/create_notebook.py` for future deep learning work). |
157 | | - |
158 | | -Run it with JupyterLab (`jupyter lab Spatial_Awareness_Project.ipynb`) after setting up the environment. |
159 | | - |
160 | | -## Core Library Modules (`_archive/src/`) |
161 | | -- `preprocess/csi_loader.py`: Recursively lists CSI recordings, infers labels from filenames, and converts `.dat`, `.txt`, `.csv`, or `.npy` files into numpy arrays. |
162 | | -- `preprocess/dat_loader.py`: Thin wrapper over `csiread` + legacy parsing helpers for Intel 5300 `.dat` files, handling antenna permutations and amplitude extraction. |
163 | | -- `preprocess/preprocess.py`: Windowing (`window_csi`), denoising (`denoise_window`), normalization (`normalize_window`), and serialization (`save_windows`). |
164 | | -- `preprocess/features.py`: Defines the 14 handcrafted features used throughout the project. |
165 | | -- `preprocess/inspect_wiar.py`: Quickly inspects WiAR metadata, packet counts, motion scores, and activity mappings. |
166 | | -- `models/motion_detector.py`: Provides a simple `MotionDetector` class with `predict_proba` / `predict` methods that internally load the saved scaler + Random Forest. |
167 | | -- `train/dataset.py`: PyTorch Dataset/Loader utilities for window tensors—handy if you want to extend the work with CNNs or transformers later. |
168 | | - |
169 | | -## Saved Artifacts & Reports |
170 | | -- `models/presence_detector_rf.joblib`: Fitted RandomForestClassifier. |
171 | | -- `models/presence_detector_scaler.joblib`: StandardScaler trained on the binary dataset. |
172 | | -- `models/presence_detector_pipeline.joblib`: End-to-end pipeline object (scaler + model). |
173 | | -- `_archive/model_tools/html/*.html`: Frozen notebook exports with visualizations for the detector, heatmaps, and random samples. |
174 | | -- `data/processed/**`: (Not tracked) holds windows, features, synthetic data, binary dataset, plus validation reports. |
175 | | - |
176 | | -## Testing & Validation |
177 | | -- Dataset sanity checks live inside `_archive/scripts/validate_binary_dataset.py`. |
178 | | -- Group-aware train/test splits and cross-validation are enforced in the notebook and tuning script to avoid subject leakage. |
179 | | -- There are no standalone pytest suites checked in; when extending the project consider wrapping the CLI scripts with regression tests. |
180 | | - |
181 | | -## References |
182 | | -- **WiAR Dataset**: L. Guo et al., *“A Novel Benchmark on Human Activity Recognition Using WiFi Signals,”* IEEE Healthcom, 2017. |
183 | | -- **Intel 5300 CSI Tool**: <http://dhalperi.github.io/linux-80211n-csitool/> |
184 | | -- **csiread Library**: <https://github.com/citywu/csiread> |
185 | | - |
186 | 71 | ## Authors |
187 | 72 | - Rishabh (230178) |
188 | 73 | - Shivansh (230054) |
189 | | -Newton School of Technology — Computer Networks + AI/ML capstone |
| 74 | +Newton School of Technology — Computer Networks + AI/ML Capstone |
0 commit comments