|
1 | | -# Spatial Awareness Project |
| 1 | +# Spatial Awareness through Ambient Wireless Signals |
2 | 2 |
|
3 | | -Run the Jupyter Notebook 'Spatial_Awareness_Project.ipynb' to see the project. |
| 3 | +WiFi Channel State Information (CSI) can double as a privacy-preserving motion sensor. |
| 4 | +This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, CLI utilities, and a reproducible notebook. The code sits in the `_archive/` directory, while the root keeps the student-facing assets (notebook, setup helpers, trained models). |
| 5 | + |
| 6 | +## Highlights |
| 7 | +- Parses raw 802.11n CSI traces from the WiAR dataset with `csiread` |
| 8 | +- Generates fixed-length CSI windows, extracts 14 statistical features, and fuses them into a binary activity dataset |
| 9 | +- Trains and tunes a Random Forest presence detector, saving joblib artifacts plus metrics |
| 10 | +- Provides live/recorded visualizations (heatmaps, probability curves) and HTML exports for presentations |
| 11 | +- Includes a single notebook that walks through data loading, synthetic empty-room generation, training, and evaluation |
| 12 | + |
| 13 | +## Repository Tour |
| 14 | +``` |
| 15 | +. |
| 16 | +├── Spatial_Awareness_Project.ipynb # End-to-end, commented walkthrough |
| 17 | +├── _archive/ # Source modules and CLI utilities |
| 18 | +│ ├── scripts/ # Data prep + pipeline CLIs |
| 19 | +│ ├── model_tools/ # Training + visualization scripts |
| 20 | +│ └── src/ # Library code (preprocess/models/train) |
| 21 | +├── models/ # Saved Random Forest + scaler + pipeline |
| 22 | +├── requirements.txt # Python dependencies |
| 23 | +├── setup.sh # Student-friendly environment bootstrap |
| 24 | +├── Makefile # Convenience targets (setup/install/clean) |
| 25 | +└── pyproject.toml # Packaging metadata (setuptools) |
| 26 | +``` |
| 27 | + |
| 28 | +> The `data/` directory (raw WiAR captures and processed artifacts) is git-ignored. Create the structure described below before running the pipeline. |
| 29 | +
|
| 30 | +### Key Components |
| 31 | +- `Spatial_Awareness_Project.ipynb`: Runs the full workflow in one place—loading CSI, preprocessing, generating synthetic no-activity samples, training the model, plotting metrics, and exporting artifacts. |
| 32 | +- `_archive/scripts/`: Small CLIs for dataset download (`fetch_wiar.sh`), window generation, feature extraction, binary fusion, validation, and a `run_pipeline.py` orchestrator. |
| 33 | +- `_archive/model_tools/`: Training + visualization entrypoints (`train_presence_detector.py`, `tune_presence_detector.py`, `visualize_activity_heatmap.py`, `visualize_samples.py`, `visualize_live_session.py`, `live_predict.py`, `predict_from_raw.py`, `view_data.py`). `model_tools/html/` stores executed notebook exports for quick demos. |
| 34 | +- `_archive/src/`: Reusable modules |
| 35 | + - `preprocess/`: CSI loaders (`csi_loader.py`, `dat_loader.py`), windowing + normalization (`preprocess.py`), feature engineering (`features.py`), and WiAR inspection helpers. |
| 36 | + - `models/motion_detector.py`: Runtime convenience wrapper that loads the scaler + model + metadata to score new CSI windows. |
| 37 | + - `train/dataset.py`: PyTorch Dataset scaffold intended for future CNN/RNN work. |
| 38 | +- `models/`: `presence_detector_rf.joblib`, `presence_detector_scaler.joblib`, and `presence_detector_pipeline.joblib` generated by the training scripts. |
| 39 | +- `setup.sh` / `Makefile`: Lightweight automation to create a virtualenv and install requirements without digging into tooling details. |
| 40 | + |
| 41 | +## Getting Started |
| 42 | +### Prerequisites |
| 43 | +- Python 3.10+ (3.11 works best) |
| 44 | +- `pip`, `venv`, and (for `.dat` parsing) `libpcap` headers if you plan to compile `csiread` |
| 45 | +- Optional: GNU Make, tmux, and JupyterLab |
| 46 | + |
| 47 | +### Option A — one-shot setup |
| 48 | +```bash |
| 49 | +cd spatialaw |
| 50 | +chmod +x setup.sh |
| 51 | +./setup.sh |
| 52 | +``` |
| 53 | + |
| 54 | +### Option B — manual steps |
| 55 | +```bash |
| 56 | +python3 -m venv venv |
| 57 | +source venv/bin/activate |
| 58 | +pip install --upgrade pip |
| 59 | +pip install -r requirements.txt |
| 60 | +``` |
| 61 | + |
| 62 | +You can achieve the same with `make setup`, and rerun `make clean` to drop stray `__pycache__` or `.pyc` files. |
| 63 | + |
| 64 | +## Data Layout & Requirements |
| 65 | +All data lives under `data/` (ignored by git). Create the following folders before running the scripts: |
| 66 | +``` |
| 67 | +data/ |
| 68 | +├── raw/ |
| 69 | +│ └── WiAR/ # WiAR repository clone or downloaded archive |
| 70 | +└── processed/ |
| 71 | + ├── windows/ |
| 72 | + ├── features/ |
| 73 | + ├── binary/ |
| 74 | + └── synthetic_empty/ |
| 75 | +``` |
| 76 | + |
| 77 | +**Dataset:** WiAR (16 motion classes captured with an Intel 5300 NIC). Use `_archive/scripts/fetch_wiar.sh` to download and unpack the official release into `data/raw/WiAR`. |
| 78 | + |
| 79 | +## Processing & Training Pipeline |
| 80 | +Each stage is a CLI that can be run independently or via `_archive/scripts/run_pipeline.py`. |
| 81 | + |
| 82 | +1. **Generate CSI windows** |
| 83 | + ```bash |
| 84 | + python _archive/scripts/generate_windows.py \ |
| 85 | + --input-dir data/raw/WiAR \ |
| 86 | + --out-dir data/processed/windows \ |
| 87 | + --T 256 \ |
| 88 | + --stride 64 |
| 89 | + ``` |
| 90 | + - Loads `.dat`, `.txt`, `.csv`, or `.npy` CSI files |
| 91 | + - Applies denoising + z-score normalization (`preprocess.py`) |
| 92 | + - Saves `window_*.npy`, `labels.csv`, and `window_generation_summary.json` |
| 93 | + |
| 94 | +2. **Extract 14 CSI features per window** |
| 95 | + ```bash |
| 96 | + python _archive/scripts/extract_features.py \ |
| 97 | + --windows-dir data/processed/windows \ |
| 98 | + --output-dir data/processed/features |
| 99 | + ``` |
| 100 | + - Features include variance, entropy, Hilbert envelope stats, motion period, MAD, etc. (see `_archive/src/preprocess/features.py`) |
| 101 | + |
| 102 | +3. **Create the binary presence dataset** |
| 103 | + ```bash |
| 104 | + python _archive/scripts/process_binary_dataset.py \ |
| 105 | + --features-dir data/processed/features \ |
| 106 | + --output-dir data/processed/binary |
| 107 | + ``` |
| 108 | + - Combines active WiAR activities into label `1` |
| 109 | + - Selects low-motion segments as label `0` (with motion quantile filters) |
| 110 | + - Writes `features.npy`, `labels.csv`, and `feature_names.json` |
| 111 | + |
| 112 | +4. **(Optional) Generate synthetic “empty room” samples** |
| 113 | + ```bash |
| 114 | + python _archive/scripts/generate_synthetic_empty.py \ |
| 115 | + --output-dir data/processed/synthetic_empty \ |
| 116 | + --n-samples 500 |
| 117 | + ``` |
| 118 | + - Matches the notebook’s synthetic-noise generator to balance classes when real idle captures are scarce. |
| 119 | + |
| 120 | +5. **Validate the binary dataset** |
| 121 | + ```bash |
| 122 | + python _archive/scripts/validate_binary_dataset.py \ |
| 123 | + --binary-dir data/processed/binary |
| 124 | + ``` |
| 125 | + - Produces `validation_report.json` (label histograms, motion percentiles, leakage checks). |
| 126 | + |
| 127 | +6. **Run the whole pipeline (any subset of steps)** |
| 128 | + ```bash |
| 129 | + python _archive/scripts/run_pipeline.py --steps windows features binary validate |
| 130 | + ``` |
| 131 | + |
| 132 | +## Modeling & Visualization Tools |
| 133 | +Located in `_archive/model_tools/`: |
| 134 | + |
| 135 | +| Script | Purpose | Outputs | |
| 136 | +| --- | --- | --- | |
| 137 | +| `train_presence_detector.py` | Train + evaluate RandomForestClassifier on the binary dataset; optionally reloads existing scaler/model. | Saves `.joblib` models, `presence_detector_metrics.json`, confusion matrix, ROC curve, feature importance plots. | |
| 138 | +| `tune_presence_detector.py` | Grid search with GroupKFold cross-validation. | `models/tuning_results.json`, best model artifacts. | |
| 139 | +| `visualize_activity_heatmap.py` | Draws CSI heatmaps with predicted probabilities overlayed (great for demos). | PNG/interactive matplotlib windows. | |
| 140 | +| `visualize_samples.py` | Random window explorer for debugging feature quality. | Matplotlib figures. | |
| 141 | +| `visualize_live_session.py` | Streams pre-recorded sessions as if live, scoring each window. | Live matplotlib updates. | |
| 142 | +| `live_predict.py` | Connects to an incoming CSI stream (or directory) and prints rolling predictions. | Terminal output + optional plots. | |
| 143 | +| `predict_from_raw.py` | Convenience wrapper to score a single raw CSI file end-to-end. | Prints probability + label. | |
| 144 | +| `view_data.py` | Dumps `.npy` feature data, label counts, and metadata. | Terminal table summaries. | |
| 145 | + |
| 146 | +Executed HTML exports of the major scripts live in `_archive/model_tools/html/` so you can skim outputs without running the code. |
| 147 | + |
| 148 | +## Notebook Walkthrough (`Spatial_Awareness_Project.ipynb`) |
| 149 | +The notebook mirrors the CLI pipeline but keeps everything in one place for reports: |
| 150 | +- **Sections 1–2**: Data loading utilities for Intel 5300 `.dat` files, including a fallback parser and visual sanity checks (packet counts, amplitude heatmaps). |
| 151 | +- **Sections 3–4**: Preprocessing helpers (Butterworth filters, Hilbert envelope, z-score normalization) and window visualization. |
| 152 | +- **Section 5**: Feature extraction (the same 14 statistical descriptors used by the CLIs) with pandas summaries. |
| 153 | +- **Section 6**: Synthetic “empty room” generator + SMOTE oversampling to handle class imbalance when genuine idle captures are missing. |
| 154 | +- **Section 7**: Train/test split with `GroupShuffleSplit`, Random Forest training (balanced class weights), metrics (accuracy, precision, recall, F1, ROC-AUC), confusion matrix, ROC curve, and distribution plots. |
| 155 | +- **Section 8**: Model export via `joblib`, along with helper functions to reload the scaler/model for downstream scripts. |
| 156 | +- **Appendix**: Utility cells for plotting CSI heatmaps, inspecting feature importances, and sandboxing experimental architectures (there is an `add_cnn_to_notebook.py` helper in `_archive/scripts/create_notebook.py` for future deep learning work). |
| 157 | + |
| 158 | +Run it with JupyterLab (`jupyter lab Spatial_Awareness_Project.ipynb`) after setting up the environment. |
| 159 | + |
| 160 | +## Core Library Modules (`_archive/src/`) |
| 161 | +- `preprocess/csi_loader.py`: Recursively lists CSI recordings, infers labels from filenames, and converts `.dat`, `.txt`, `.csv`, or `.npy` files into numpy arrays. |
| 162 | +- `preprocess/dat_loader.py`: Thin wrapper over `csiread` + legacy parsing helpers for Intel 5300 `.dat` files, handling antenna permutations and amplitude extraction. |
| 163 | +- `preprocess/preprocess.py`: Windowing (`window_csi`), denoising (`denoise_window`), normalization (`normalize_window`), and serialization (`save_windows`). |
| 164 | +- `preprocess/features.py`: Defines the 14 handcrafted features used throughout the project. |
| 165 | +- `preprocess/inspect_wiar.py`: Quickly inspects WiAR metadata, packet counts, motion scores, and activity mappings. |
| 166 | +- `models/motion_detector.py`: Provides a simple `MotionDetector` class with `predict_proba` / `predict` methods that internally load the saved scaler + Random Forest. |
| 167 | +- `train/dataset.py`: PyTorch Dataset/Loader utilities for window tensors—handy if you want to extend the work with CNNs or transformers later. |
| 168 | + |
| 169 | +## Saved Artifacts & Reports |
| 170 | +- `models/presence_detector_rf.joblib`: Fitted RandomForestClassifier. |
| 171 | +- `models/presence_detector_scaler.joblib`: StandardScaler trained on the binary dataset. |
| 172 | +- `models/presence_detector_pipeline.joblib`: End-to-end pipeline object (scaler + model). |
| 173 | +- `_archive/model_tools/html/*.html`: Frozen notebook exports with visualizations for the detector, heatmaps, and random samples. |
| 174 | +- `data/processed/**`: (Not tracked) holds windows, features, synthetic data, binary dataset, plus validation reports. |
| 175 | + |
| 176 | +## Testing & Validation |
| 177 | +- Dataset sanity checks live inside `_archive/scripts/validate_binary_dataset.py`. |
| 178 | +- Group-aware train/test splits and cross-validation are enforced in the notebook and tuning script to avoid subject leakage. |
| 179 | +- There are no standalone pytest suites checked in; when extending the project consider wrapping the CLI scripts with regression tests. |
| 180 | + |
| 181 | +## References |
| 182 | +- **WiAR Dataset**: L. Guo et al., *“A Novel Benchmark on Human Activity Recognition Using WiFi Signals,”* IEEE Healthcom, 2017. |
| 183 | +- **Intel 5300 CSI Tool**: <http://dhalperi.github.io/linux-80211n-csitool/> |
| 184 | +- **csiread Library**: <https://github.com/citywu/csiread> |
| 185 | + |
| 186 | +## Authors |
| 187 | +- Rishabh (230178) |
| 188 | +- Shivansh (230054) |
| 189 | +Newton School of Technology — Computer Networks + AI/ML capstone |
0 commit comments