Skip to content

Commit 0965534

Browse files
committed
random forest working well
1 parent 55f926f commit 0965534

13 files changed

+1622
-194
lines changed

README.md

Lines changed: 188 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,189 @@
1-
# Spatial Awareness Project
1+
# Spatial Awareness through Ambient Wireless Signals
22

3-
Run the Jupyter Notebook 'Spatial_Awareness_Project.ipynb' to see the project.
3+
WiFi Channel State Information (CSI) can double as a privacy-preserving motion sensor.
4+
This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, CLI utilities, and a reproducible notebook. The code sits in the `_archive/` directory, while the root keeps the student-facing assets (notebook, setup helpers, trained models).
5+
6+
## Highlights
7+
- Parses raw 802.11n CSI traces from the WiAR dataset with `csiread`
8+
- Generates fixed-length CSI windows, extracts 14 statistical features, and fuses them into a binary activity dataset
9+
- Trains and tunes a Random Forest presence detector, saving joblib artifacts plus metrics
10+
- Provides live/recorded visualizations (heatmaps, probability curves) and HTML exports for presentations
11+
- Includes a single notebook that walks through data loading, synthetic empty-room generation, training, and evaluation
12+
13+
## Repository Tour
14+
```
15+
.
16+
├── Spatial_Awareness_Project.ipynb # End-to-end, commented walkthrough
17+
├── _archive/ # Source modules and CLI utilities
18+
│ ├── scripts/ # Data prep + pipeline CLIs
19+
│ ├── model_tools/ # Training + visualization scripts
20+
│ └── src/ # Library code (preprocess/models/train)
21+
├── models/ # Saved Random Forest + scaler + pipeline
22+
├── requirements.txt # Python dependencies
23+
├── setup.sh # Student-friendly environment bootstrap
24+
├── Makefile # Convenience targets (setup/install/clean)
25+
└── pyproject.toml # Packaging metadata (setuptools)
26+
```
27+
28+
> The `data/` directory (raw WiAR captures and processed artifacts) is git-ignored. Create the structure described below before running the pipeline.
29+
30+
### Key Components
31+
- `Spatial_Awareness_Project.ipynb`: Runs the full workflow in one place—loading CSI, preprocessing, generating synthetic no-activity samples, training the model, plotting metrics, and exporting artifacts.
32+
- `_archive/scripts/`: Small CLIs for dataset download (`fetch_wiar.sh`), window generation, feature extraction, binary fusion, validation, and a `run_pipeline.py` orchestrator.
33+
- `_archive/model_tools/`: Training + visualization entrypoints (`train_presence_detector.py`, `tune_presence_detector.py`, `visualize_activity_heatmap.py`, `visualize_samples.py`, `visualize_live_session.py`, `live_predict.py`, `predict_from_raw.py`, `view_data.py`). `model_tools/html/` stores executed notebook exports for quick demos.
34+
- `_archive/src/`: Reusable modules
35+
- `preprocess/`: CSI loaders (`csi_loader.py`, `dat_loader.py`), windowing + normalization (`preprocess.py`), feature engineering (`features.py`), and WiAR inspection helpers.
36+
- `models/motion_detector.py`: Runtime convenience wrapper that loads the scaler + model + metadata to score new CSI windows.
37+
- `train/dataset.py`: PyTorch Dataset scaffold intended for future CNN/RNN work.
38+
- `models/`: `presence_detector_rf.joblib`, `presence_detector_scaler.joblib`, and `presence_detector_pipeline.joblib` generated by the training scripts.
39+
- `setup.sh` / `Makefile`: Lightweight automation to create a virtualenv and install requirements without digging into tooling details.
40+
41+
## Getting Started
42+
### Prerequisites
43+
- Python 3.10+ (3.11 works best)
44+
- `pip`, `venv`, and (for `.dat` parsing) `libpcap` headers if you plan to compile `csiread`
45+
- Optional: GNU Make, tmux, and JupyterLab
46+
47+
### Option A — one-shot setup
48+
```bash
49+
cd spatialaw
50+
chmod +x setup.sh
51+
./setup.sh
52+
```
53+
54+
### Option B — manual steps
55+
```bash
56+
python3 -m venv venv
57+
source venv/bin/activate
58+
pip install --upgrade pip
59+
pip install -r requirements.txt
60+
```
61+
62+
You can achieve the same with `make setup`, and rerun `make clean` to drop stray `__pycache__` or `.pyc` files.
63+
64+
## Data Layout & Requirements
65+
All data lives under `data/` (ignored by git). Create the following folders before running the scripts:
66+
```
67+
data/
68+
├── raw/
69+
│ └── WiAR/ # WiAR repository clone or downloaded archive
70+
└── processed/
71+
├── windows/
72+
├── features/
73+
├── binary/
74+
└── synthetic_empty/
75+
```
76+
77+
**Dataset:** WiAR (16 motion classes captured with an Intel 5300 NIC). Use `_archive/scripts/fetch_wiar.sh` to download and unpack the official release into `data/raw/WiAR`.
78+
79+
## Processing & Training Pipeline
80+
Each stage is a CLI that can be run independently or via `_archive/scripts/run_pipeline.py`.
81+
82+
1. **Generate CSI windows**
83+
```bash
84+
python _archive/scripts/generate_windows.py \
85+
--input-dir data/raw/WiAR \
86+
--out-dir data/processed/windows \
87+
--T 256 \
88+
--stride 64
89+
```
90+
- Loads `.dat`, `.txt`, `.csv`, or `.npy` CSI files
91+
- Applies denoising + z-score normalization (`preprocess.py`)
92+
- Saves `window_*.npy`, `labels.csv`, and `window_generation_summary.json`
93+
94+
2. **Extract 14 CSI features per window**
95+
```bash
96+
python _archive/scripts/extract_features.py \
97+
--windows-dir data/processed/windows \
98+
--output-dir data/processed/features
99+
```
100+
- Features include variance, entropy, Hilbert envelope stats, motion period, MAD, etc. (see `_archive/src/preprocess/features.py`)
101+
102+
3. **Create the binary presence dataset**
103+
```bash
104+
python _archive/scripts/process_binary_dataset.py \
105+
--features-dir data/processed/features \
106+
--output-dir data/processed/binary
107+
```
108+
- Combines active WiAR activities into label `1`
109+
- Selects low-motion segments as label `0` (with motion quantile filters)
110+
- Writes `features.npy`, `labels.csv`, and `feature_names.json`
111+
112+
4. **(Optional) Generate synthetic “empty room” samples**
113+
```bash
114+
python _archive/scripts/generate_synthetic_empty.py \
115+
--output-dir data/processed/synthetic_empty \
116+
--n-samples 500
117+
```
118+
- Matches the notebook’s synthetic-noise generator to balance classes when real idle captures are scarce.
119+
120+
5. **Validate the binary dataset**
121+
```bash
122+
python _archive/scripts/validate_binary_dataset.py \
123+
--binary-dir data/processed/binary
124+
```
125+
- Produces `validation_report.json` (label histograms, motion percentiles, leakage checks).
126+
127+
6. **Run the whole pipeline (any subset of steps)**
128+
```bash
129+
python _archive/scripts/run_pipeline.py --steps windows features binary validate
130+
```
131+
132+
## Modeling & Visualization Tools
133+
Located in `_archive/model_tools/`:
134+
135+
| Script | Purpose | Outputs |
136+
| --- | --- | --- |
137+
| `train_presence_detector.py` | Train + evaluate RandomForestClassifier on the binary dataset; optionally reloads existing scaler/model. | Saves `.joblib` models, `presence_detector_metrics.json`, confusion matrix, ROC curve, feature importance plots. |
138+
| `tune_presence_detector.py` | Grid search with GroupKFold cross-validation. | `models/tuning_results.json`, best model artifacts. |
139+
| `visualize_activity_heatmap.py` | Draws CSI heatmaps with predicted probabilities overlayed (great for demos). | PNG/interactive matplotlib windows. |
140+
| `visualize_samples.py` | Random window explorer for debugging feature quality. | Matplotlib figures. |
141+
| `visualize_live_session.py` | Streams pre-recorded sessions as if live, scoring each window. | Live matplotlib updates. |
142+
| `live_predict.py` | Connects to an incoming CSI stream (or directory) and prints rolling predictions. | Terminal output + optional plots. |
143+
| `predict_from_raw.py` | Convenience wrapper to score a single raw CSI file end-to-end. | Prints probability + label. |
144+
| `view_data.py` | Dumps `.npy` feature data, label counts, and metadata. | Terminal table summaries. |
145+
146+
Executed HTML exports of the major scripts live in `_archive/model_tools/html/` so you can skim outputs without running the code.
147+
148+
## Notebook Walkthrough (`Spatial_Awareness_Project.ipynb`)
149+
The notebook mirrors the CLI pipeline but keeps everything in one place for reports:
150+
- **Sections 1–2**: Data loading utilities for Intel 5300 `.dat` files, including a fallback parser and visual sanity checks (packet counts, amplitude heatmaps).
151+
- **Sections 3–4**: Preprocessing helpers (Butterworth filters, Hilbert envelope, z-score normalization) and window visualization.
152+
- **Section 5**: Feature extraction (the same 14 statistical descriptors used by the CLIs) with pandas summaries.
153+
- **Section 6**: Synthetic “empty room” generator + SMOTE oversampling to handle class imbalance when genuine idle captures are missing.
154+
- **Section 7**: Train/test split with `GroupShuffleSplit`, Random Forest training (balanced class weights), metrics (accuracy, precision, recall, F1, ROC-AUC), confusion matrix, ROC curve, and distribution plots.
155+
- **Section 8**: Model export via `joblib`, along with helper functions to reload the scaler/model for downstream scripts.
156+
- **Appendix**: Utility cells for plotting CSI heatmaps, inspecting feature importances, and sandboxing experimental architectures (there is an `add_cnn_to_notebook.py` helper in `_archive/scripts/create_notebook.py` for future deep learning work).
157+
158+
Run it with JupyterLab (`jupyter lab Spatial_Awareness_Project.ipynb`) after setting up the environment.
159+
160+
## Core Library Modules (`_archive/src/`)
161+
- `preprocess/csi_loader.py`: Recursively lists CSI recordings, infers labels from filenames, and converts `.dat`, `.txt`, `.csv`, or `.npy` files into numpy arrays.
162+
- `preprocess/dat_loader.py`: Thin wrapper over `csiread` + legacy parsing helpers for Intel 5300 `.dat` files, handling antenna permutations and amplitude extraction.
163+
- `preprocess/preprocess.py`: Windowing (`window_csi`), denoising (`denoise_window`), normalization (`normalize_window`), and serialization (`save_windows`).
164+
- `preprocess/features.py`: Defines the 14 handcrafted features used throughout the project.
165+
- `preprocess/inspect_wiar.py`: Quickly inspects WiAR metadata, packet counts, motion scores, and activity mappings.
166+
- `models/motion_detector.py`: Provides a simple `MotionDetector` class with `predict_proba` / `predict` methods that internally load the saved scaler + Random Forest.
167+
- `train/dataset.py`: PyTorch Dataset/Loader utilities for window tensors—handy if you want to extend the work with CNNs or transformers later.
168+
169+
## Saved Artifacts & Reports
170+
- `models/presence_detector_rf.joblib`: Fitted RandomForestClassifier.
171+
- `models/presence_detector_scaler.joblib`: StandardScaler trained on the binary dataset.
172+
- `models/presence_detector_pipeline.joblib`: End-to-end pipeline object (scaler + model).
173+
- `_archive/model_tools/html/*.html`: Frozen notebook exports with visualizations for the detector, heatmaps, and random samples.
174+
- `data/processed/**`: (Not tracked) holds windows, features, synthetic data, binary dataset, plus validation reports.
175+
176+
## Testing & Validation
177+
- Dataset sanity checks live inside `_archive/scripts/validate_binary_dataset.py`.
178+
- Group-aware train/test splits and cross-validation are enforced in the notebook and tuning script to avoid subject leakage.
179+
- There are no standalone pytest suites checked in; when extending the project consider wrapping the CLI scripts with regression tests.
180+
181+
## References
182+
- **WiAR Dataset**: L. Guo et al., *“A Novel Benchmark on Human Activity Recognition Using WiFi Signals,”* IEEE Healthcom, 2017.
183+
- **Intel 5300 CSI Tool**: <http://dhalperi.github.io/linux-80211n-csitool/>
184+
- **csiread Library**: <https://github.com/citywu/csiread>
185+
186+
## Authors
187+
- Rishabh (230178)
188+
- Shivansh (230054)
189+
Newton School of Technology — Computer Networks + AI/ML capstone

0 commit comments

Comments
 (0)