Skip to content

Commit 84d6309

Browse files
committed
readme update
1 parent c1218f7 commit 84d6309

File tree

6 files changed

+46
-557
lines changed

6 files changed

+46
-557
lines changed

README.md

Lines changed: 46 additions & 161 deletions
Original file line numberDiff line numberDiff line change
@@ -1,189 +1,74 @@
11
# Spatial Awareness through Ambient Wireless Signals
22

33
WiFi Channel State Information (CSI) can double as a privacy-preserving motion sensor.
4-
This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, CLI utilities, and a reproducible notebook. The code sits in the `_archive/` directory, while the root keeps the student-facing assets (notebook, setup helpers, trained models).
4+
This repo contains everything we used to turn the WiAR dataset (Intel 5300 CSI captures) into a binary **presence detector** with visualizations, an interactive dashboard, and a reproducible notebook.
55

66
## Highlights
7-
- Parses raw 802.11n CSI traces from the WiAR dataset with `csiread`
8-
- Generates fixed-length CSI windows, extracts 14 statistical features, and fuses them into a binary activity dataset
9-
- Trains and tunes a Random Forest presence detector, saving joblib artifacts plus metrics
10-
- Provides live/recorded visualizations (heatmaps, probability curves) and HTML exports for presentations
11-
- Includes a single notebook that walks through data loading, synthetic empty-room generation, training, and evaluation
7+
- **Two Models**:
8+
- **Random Forest**: Fast, feature-based (96% accuracy).
9+
- **1D-CNN (Deep Learning)**: End-to-end learning on raw CSI (91%+ accuracy).
10+
- **Interactive Dashboard**: A Streamlit app for live demonstrations, featuring real-time simulation and model switching.
11+
- **End-to-End Notebook**: `Spatial_Awareness_Project.ipynb` walks through the entire pipeline (Data Loading -> Preprocessing -> Training -> Evaluation).
12+
- **Synthetic Data**: Generates "Empty Room" samples to handle class imbalance.
1213

1314
## Repository Tour
1415
```
1516
.
16-
├── Spatial_Awareness_Project.ipynb # End-to-end, commented walkthrough
17-
├── _archive/ # Source modules and CLI utilities
18-
│ ├── scripts/ # Data prep + pipeline CLIs
19-
│ ├── model_tools/ # Training + visualization scripts
20-
│ └── src/ # Library code (preprocess/models/train)
21-
├── models/ # Saved Random Forest + scaler + pipeline
17+
├── Spatial_Awareness_Project.ipynb # Main Project Notebook (Report)
18+
├── app.py # Interactive Streamlit Dashboard
19+
├── models/ # Saved Models
20+
│ ├── presence_detector_rf.joblib # Random Forest Model
21+
│ ├── presence_detector_scaler.joblib # Scaler for RF
22+
│ └── presence_detector_cnn.pth # CNN Model (PyTorch)
23+
├── model_tools/ # Model Utilities
24+
│ ├── train_cnn.py # Script to train/retrain the CNN
25+
│ └── predict_from_raw.py # CLI tool for single-file prediction
26+
├── data/ # Dataset (Git-ignored)
2227
├── requirements.txt # Python dependencies
23-
├── setup.sh # Student-friendly environment bootstrap
24-
├── Makefile # Convenience targets (setup/install/clean)
25-
└── pyproject.toml # Packaging metadata (setuptools)
28+
└── _archive/ # Legacy/Helper scripts
2629
```
2730

28-
> The `data/` directory (raw WiAR captures and processed artifacts) is git-ignored. Create the structure described below before running the pipeline.
29-
30-
### Key Components
31-
- `Spatial_Awareness_Project.ipynb`: Runs the full workflow in one place—loading CSI, preprocessing, generating synthetic no-activity samples, training the model, plotting metrics, and exporting artifacts.
32-
- `_archive/scripts/`: Small CLIs for dataset download (`fetch_wiar.sh`), window generation, feature extraction, binary fusion, validation, and a `run_pipeline.py` orchestrator.
33-
- `_archive/model_tools/`: Training + visualization entrypoints (`train_presence_detector.py`, `tune_presence_detector.py`, `visualize_activity_heatmap.py`, `visualize_samples.py`, `visualize_live_session.py`, `live_predict.py`, `predict_from_raw.py`, `view_data.py`). `model_tools/html/` stores executed notebook exports for quick demos.
34-
- `_archive/src/`: Reusable modules
35-
- `preprocess/`: CSI loaders (`csi_loader.py`, `dat_loader.py`), windowing + normalization (`preprocess.py`), feature engineering (`features.py`), and WiAR inspection helpers.
36-
- `models/motion_detector.py`: Runtime convenience wrapper that loads the scaler + model + metadata to score new CSI windows.
37-
- `train/dataset.py`: PyTorch Dataset scaffold intended for future CNN/RNN work.
38-
- `models/`: `presence_detector_rf.joblib`, `presence_detector_scaler.joblib`, and `presence_detector_pipeline.joblib` generated by the training scripts.
39-
- `setup.sh` / `Makefile`: Lightweight automation to create a virtualenv and install requirements without digging into tooling details.
40-
4131
## Getting Started
42-
### Prerequisites
43-
- Python 3.10+ (3.11 works best)
44-
- `pip`, `venv`, and (for `.dat` parsing) `libpcap` headers if you plan to compile `csiread`
45-
- Optional: GNU Make, tmux, and JupyterLab
4632

47-
### Option A — one-shot setup
33+
### 1. Setup Environment
4834
```bash
49-
cd spatialaw
50-
chmod +x setup.sh
51-
./setup.sh
35+
# Create virtual environment
36+
python3 -m venv .venv
37+
source .venv/bin/activate
38+
39+
# Install dependencies
40+
pip install -r requirements.txt
5241
```
5342

54-
### Option B — manual steps
43+
### 2. Run the Dashboard (Demo)
44+
The dashboard allows you to visualize the system in action.
5545
```bash
56-
python3 -m venv venv
57-
source venv/bin/activate
58-
pip install --upgrade pip
59-
pip install -r requirements.txt
46+
streamlit run app.py
6047
```
48+
**Features:**
49+
- **Model Selector**: Switch between Random Forest and CNN.
50+
- **Simulate Live Mode**: Replays a file as if it were a live stream.
51+
- **Stability Filter**: Smooths predictions over time.
6152

62-
You can achieve the same with `make setup`, and rerun `make clean` to drop stray `__pycache__` or `.pyc` files.
53+
### 3. Run the Notebook
54+
Open `Spatial_Awareness_Project.ipynb` in Jupyter to see the full training and evaluation report.
6355

64-
## Data Layout & Requirements
65-
All data lives under `data/` (ignored by git). Create the following folders before running the scripts:
56+
### 4. Train the CNN (Optional)
57+
If you want to retrain the Deep Learning model:
58+
```bash
59+
python model_tools/train_cnn.py
60+
```
61+
This will train the model on `data/processed/windows` and save it to `models/presence_detector_cnn.pth`.
62+
63+
## Data Layout
64+
All data lives under `data/` (ignored by git).
6665
```
6766
data/
68-
├── raw/
69-
│ └── WiAR/ # WiAR repository clone or downloaded archive
70-
└── processed/
71-
├── windows/
72-
├── features/
73-
├── binary/
74-
└── synthetic_empty/
67+
├── raw/WiAR/ # Original Dataset
68+
└── processed/ # Generated Windows & Features
7569
```
7670

77-
**Dataset:** WiAR (16 motion classes captured with an Intel 5300 NIC). Use `_archive/scripts/fetch_wiar.sh` to download and unpack the official release into `data/raw/WiAR`.
78-
79-
## Processing & Training Pipeline
80-
Each stage is a CLI that can be run independently or via `_archive/scripts/run_pipeline.py`.
81-
82-
1. **Generate CSI windows**
83-
```bash
84-
python _archive/scripts/generate_windows.py \
85-
--input-dir data/raw/WiAR \
86-
--out-dir data/processed/windows \
87-
--T 256 \
88-
--stride 64
89-
```
90-
- Loads `.dat`, `.txt`, `.csv`, or `.npy` CSI files
91-
- Applies denoising + z-score normalization (`preprocess.py`)
92-
- Saves `window_*.npy`, `labels.csv`, and `window_generation_summary.json`
93-
94-
2. **Extract 14 CSI features per window**
95-
```bash
96-
python _archive/scripts/extract_features.py \
97-
--windows-dir data/processed/windows \
98-
--output-dir data/processed/features
99-
```
100-
- Features include variance, entropy, Hilbert envelope stats, motion period, MAD, etc. (see `_archive/src/preprocess/features.py`)
101-
102-
3. **Create the binary presence dataset**
103-
```bash
104-
python _archive/scripts/process_binary_dataset.py \
105-
--features-dir data/processed/features \
106-
--output-dir data/processed/binary
107-
```
108-
- Combines active WiAR activities into label `1`
109-
- Selects low-motion segments as label `0` (with motion quantile filters)
110-
- Writes `features.npy`, `labels.csv`, and `feature_names.json`
111-
112-
4. **(Optional) Generate synthetic “empty room” samples**
113-
```bash
114-
python _archive/scripts/generate_synthetic_empty.py \
115-
--output-dir data/processed/synthetic_empty \
116-
--n-samples 500
117-
```
118-
- Matches the notebook’s synthetic-noise generator to balance classes when real idle captures are scarce.
119-
120-
5. **Validate the binary dataset**
121-
```bash
122-
python _archive/scripts/validate_binary_dataset.py \
123-
--binary-dir data/processed/binary
124-
```
125-
- Produces `validation_report.json` (label histograms, motion percentiles, leakage checks).
126-
127-
6. **Run the whole pipeline (any subset of steps)**
128-
```bash
129-
python _archive/scripts/run_pipeline.py --steps windows features binary validate
130-
```
131-
132-
## Modeling & Visualization Tools
133-
Located in `_archive/model_tools/`:
134-
135-
| Script | Purpose | Outputs |
136-
| --- | --- | --- |
137-
| `train_presence_detector.py` | Train + evaluate RandomForestClassifier on the binary dataset; optionally reloads existing scaler/model. | Saves `.joblib` models, `presence_detector_metrics.json`, confusion matrix, ROC curve, feature importance plots. |
138-
| `tune_presence_detector.py` | Grid search with GroupKFold cross-validation. | `models/tuning_results.json`, best model artifacts. |
139-
| `visualize_activity_heatmap.py` | Draws CSI heatmaps with predicted probabilities overlayed (great for demos). | PNG/interactive matplotlib windows. |
140-
| `visualize_samples.py` | Random window explorer for debugging feature quality. | Matplotlib figures. |
141-
| `visualize_live_session.py` | Streams pre-recorded sessions as if live, scoring each window. | Live matplotlib updates. |
142-
| `live_predict.py` | Connects to an incoming CSI stream (or directory) and prints rolling predictions. | Terminal output + optional plots. |
143-
| `predict_from_raw.py` | Convenience wrapper to score a single raw CSI file end-to-end. | Prints probability + label. |
144-
| `view_data.py` | Dumps `.npy` feature data, label counts, and metadata. | Terminal table summaries. |
145-
146-
Executed HTML exports of the major scripts live in `_archive/model_tools/html/` so you can skim outputs without running the code.
147-
148-
## Notebook Walkthrough (`Spatial_Awareness_Project.ipynb`)
149-
The notebook mirrors the CLI pipeline but keeps everything in one place for reports:
150-
- **Sections 1–2**: Data loading utilities for Intel 5300 `.dat` files, including a fallback parser and visual sanity checks (packet counts, amplitude heatmaps).
151-
- **Sections 3–4**: Preprocessing helpers (Butterworth filters, Hilbert envelope, z-score normalization) and window visualization.
152-
- **Section 5**: Feature extraction (the same 14 statistical descriptors used by the CLIs) with pandas summaries.
153-
- **Section 6**: Synthetic “empty room” generator + SMOTE oversampling to handle class imbalance when genuine idle captures are missing.
154-
- **Section 7**: Train/test split with `GroupShuffleSplit`, Random Forest training (balanced class weights), metrics (accuracy, precision, recall, F1, ROC-AUC), confusion matrix, ROC curve, and distribution plots.
155-
- **Section 8**: Model export via `joblib`, along with helper functions to reload the scaler/model for downstream scripts.
156-
- **Appendix**: Utility cells for plotting CSI heatmaps, inspecting feature importances, and sandboxing experimental architectures (there is an `add_cnn_to_notebook.py` helper in `_archive/scripts/create_notebook.py` for future deep learning work).
157-
158-
Run it with JupyterLab (`jupyter lab Spatial_Awareness_Project.ipynb`) after setting up the environment.
159-
160-
## Core Library Modules (`_archive/src/`)
161-
- `preprocess/csi_loader.py`: Recursively lists CSI recordings, infers labels from filenames, and converts `.dat`, `.txt`, `.csv`, or `.npy` files into numpy arrays.
162-
- `preprocess/dat_loader.py`: Thin wrapper over `csiread` + legacy parsing helpers for Intel 5300 `.dat` files, handling antenna permutations and amplitude extraction.
163-
- `preprocess/preprocess.py`: Windowing (`window_csi`), denoising (`denoise_window`), normalization (`normalize_window`), and serialization (`save_windows`).
164-
- `preprocess/features.py`: Defines the 14 handcrafted features used throughout the project.
165-
- `preprocess/inspect_wiar.py`: Quickly inspects WiAR metadata, packet counts, motion scores, and activity mappings.
166-
- `models/motion_detector.py`: Provides a simple `MotionDetector` class with `predict_proba` / `predict` methods that internally load the saved scaler + Random Forest.
167-
- `train/dataset.py`: PyTorch Dataset/Loader utilities for window tensors—handy if you want to extend the work with CNNs or transformers later.
168-
169-
## Saved Artifacts & Reports
170-
- `models/presence_detector_rf.joblib`: Fitted RandomForestClassifier.
171-
- `models/presence_detector_scaler.joblib`: StandardScaler trained on the binary dataset.
172-
- `models/presence_detector_pipeline.joblib`: End-to-end pipeline object (scaler + model).
173-
- `_archive/model_tools/html/*.html`: Frozen notebook exports with visualizations for the detector, heatmaps, and random samples.
174-
- `data/processed/**`: (Not tracked) holds windows, features, synthetic data, binary dataset, plus validation reports.
175-
176-
## Testing & Validation
177-
- Dataset sanity checks live inside `_archive/scripts/validate_binary_dataset.py`.
178-
- Group-aware train/test splits and cross-validation are enforced in the notebook and tuning script to avoid subject leakage.
179-
- There are no standalone pytest suites checked in; when extending the project consider wrapping the CLI scripts with regression tests.
180-
181-
## References
182-
- **WiAR Dataset**: L. Guo et al., *“A Novel Benchmark on Human Activity Recognition Using WiFi Signals,”* IEEE Healthcom, 2017.
183-
- **Intel 5300 CSI Tool**: <http://dhalperi.github.io/linux-80211n-csitool/>
184-
- **csiread Library**: <https://github.com/citywu/csiread>
185-
18671
## Authors
18772
- Rishabh (230178)
18873
- Shivansh (230054)
189-
Newton School of Technology — Computer Networks + AI/ML capstone
74+
Newton School of Technology — Computer Networks + AI/ML Capstone

0 commit comments

Comments
 (0)