Skip to content

Commit 0c5961d

Browse files
authored
doc: restructure and reorganize docs (#143)
Create a dedicated CONTRIBUTING.md as a landing point for someone that wants to contribute to the codebase. Cross reference design documents. Explain the high-level data flow and how the major components in the codebase tie together. Update code example and architecture docs. Add deepwiki badge.
1 parent 6238c7d commit 0c5961d

File tree

8 files changed

+188
-101
lines changed

8 files changed

+188
-101
lines changed

CONTRIBUTING.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Contributing
2+
3+
## Development Environment
4+
5+
We use [uv](https://astral.sh/uv) as a replacement for several Python repository
6+
management tools such as `pip`, `poetry`, etc.
7+
8+
### Installing uv
9+
10+
On Ubuntu:
11+
12+
```bash
13+
sudo snap install astral-uv --classic
14+
```
15+
16+
On macOS:
17+
18+
```bash
19+
brew install uv
20+
```
21+
22+
On other platforms, see the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/).
23+
24+
### Quick Start
25+
26+
```bash
27+
# Clone the repository
28+
git clone git@github.com:m-lab/iqb.git
29+
cd iqb
30+
31+
# Sync all dependencies (creates .venv automatically)
32+
uv sync --dev
33+
34+
# Run the Streamlit prototype
35+
cd prototype
36+
uv run streamlit run Home.py
37+
```
38+
39+
### Using VSCode
40+
41+
This repository is configured for VSCode with selected Python
42+
development tools (Ruff, Pyright, pytest).
43+
44+
When you first open this repository with VSCode, it will prompt you
45+
to install the required extensions for Python development.
46+
47+
Make sure you also read the following section to avoid `uv`
48+
issues: there is no official `uv` extension for VSCode yet and
49+
it seems more prudent to avoid using unofficial ones.
50+
51+
#### First-time uv setup
52+
53+
Running `uv sync --dev` creates the required `.venv` directory
54+
that VSCode needs to find the proper python version and the proper
55+
development tools.
56+
57+
If you open the repository using VSCode *before* running
58+
`uv sync --dev`, you see the following error:
59+
60+
```
61+
Unexpected error while trying to find the Ruff binary
62+
```
63+
64+
To fix this, either run `uv sync --dev` from the command line or
65+
use VSCode directly to run `uv` and reload:
66+
67+
1. Run the setup task:
68+
69+
- Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on macOS)
70+
71+
- Type "Tasks: Run Task"
72+
73+
- Select **"IQB: Setup Development Environment"**
74+
75+
- This runs `uv sync --dev` to install all development dependencies
76+
77+
2. After setup completes, reload VSCode:
78+
79+
- Press `Ctrl+Shift+P` → "Developer: Reload Window"
80+
81+
- The Ruff error should disappear
82+
83+
#### Available Tasks
84+
85+
Access them via `Ctrl+Shift+P` → "Tasks: Run Task":
86+
87+
- **IQB: Setup Development Environment** - Run `uv sync --dev` to install/update dependencies
88+
89+
- **IQB: Run Tests** - Run the pytest test suite
90+
91+
- **IQB: Run Ruff Check** - Check code style and quality
92+
93+
- **IQB: Run Pyright** - Run type checking
94+
95+
#### Extensions
96+
97+
VSCode will prompt to install these extensions:
98+
99+
- Python (`ms-python.python`)
100+
101+
- Pylance (`ms-python.vscode-pylance`)
102+
103+
- Ruff (`charliermarsh.ruff`)
104+
105+
## Component Workflows
106+
107+
Each component has its own README with specific development instructions:
108+
109+
- [library/README.md](library/README.md) — testing, linting, type checking, coding style
110+
111+
- [prototype/README.md](prototype/README.md) — running locally, Docker, deployment
112+
113+
- [data/README.md](data/README.md) — running the pipeline, cache management
114+
115+
- [analysis/README.md](analysis/README.md) — notebooks, testing notebooks
116+
117+
## Understanding the Codebase
118+
119+
- [docs/internals/](docs/internals/README.md) — sequential guide to how the data pipeline works
120+
121+
- [docs/design/](docs/design/README.md) — architecture decision records explaining why things were built this way

README.md

Lines changed: 25 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Internet Quality Barometer (IQB)
22

3-
[![Build Status](https://github.com/m-lab/iqb/actions/workflows/ci.yml/badge.svg)](https://github.com/m-lab/iqb/actions) [![codecov](https://codecov.io/gh/m-lab/iqb/branch/main/graph/badge.svg)](https://codecov.io/gh/m-lab/iqb)
3+
[![Build Status](https://github.com/m-lab/iqb/actions/workflows/ci.yml/badge.svg)](https://github.com/m-lab/iqb/actions) [![codecov](https://codecov.io/gh/m-lab/iqb/branch/main/graph/badge.svg)](https://codecov.io/gh/m-lab/iqb) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/m-lab/iqb)
44

55
This repository contains the source for code the Internet Quality Barometer (IQB)
66
library, and related applications and notebooks.
@@ -68,28 +68,35 @@ See [data/README.md](data/README.md) for details.
6868

6969
Symbolic link to [data](data) that simplifies running the pipeline on Unix.
7070

71-
## Development Environment
71+
## Data Flow
7272

73-
We use [uv](https://astral.sh/uv) as a replacement for several Python repository
74-
management tools such as `pip`, `poetry`, etc.
73+
The components above connect as follows:
7574

76-
### Installing uv
77-
78-
On Ubuntu:
79-
80-
```bash
81-
sudo snap install astral-uv --classic
75+
```
76+
BigQuery → [iqb pipeline run] → local cache/ → [IQBCache] → [IQBCalculator] → scores
77+
78+
[iqb cache pull/push] ↔ GCS
8279
```
8380

84-
On macOS:
81+
The **pipeline** queries BigQuery for M-Lab NDT measurements and stores
82+
percentile summaries as Parquet files in the local cache. To avoid expensive
83+
re-queries, **`iqb cache pull`** can download pre-computed results from GCS
84+
instead. The **`IQBCache`** API reads cached data, and **`IQBCalculator`**
85+
applies quality thresholds and weights to produce IQB scores. The
86+
**prototype** and **analysis notebooks** both consume scores through
87+
these library APIs.
8588

86-
```bash
87-
brew install uv
88-
```
89+
## Understanding the Codebase
90+
91+
- To learn **how the data pipeline works**, read the
92+
[internals guide](docs/internals/README.md) — it walks through queries,
93+
the pipeline, the remote cache, and the researcher API in sequence.
8994

90-
On other platforms, see the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/).
95+
- To understand **why specific technical decisions were made**, see the
96+
[design documents](docs/design/README.md) — architecture decision records
97+
covering cache design, data distribution, and more.
9198

92-
### Quick Start
99+
## Quick Start
93100

94101
```bash
95102
# Clone the repository
@@ -104,68 +111,5 @@ cd prototype
104111
uv run streamlit run Home.py
105112
```
106113

107-
### Using VSCode
108-
109-
This repository is configured for VSCode with selected Python
110-
development tools (Ruff, Pyright, pytest).
111-
112-
When you first open this repository with VSCode, it will prompt you
113-
to install the required extensions for Python development.
114-
115-
Make sure you also read the following section to avoid `uv`
116-
issues: there is no official `uv` extension for VSCode yet and
117-
it seems more prudent to avoid using unofficial ones.
118-
119-
#### First-time uv setup
120-
121-
Running `uv sync --dev` creates the required `.venv` directory
122-
that VSCode needs to find the proper python version and the proper
123-
development tools.
124-
125-
If you open the repository using VSCode *before* running
126-
`uv sync --dev`, you see the following error:
127-
128-
```
129-
Unexpected error while trying to find the Ruff binary
130-
```
131-
132-
To fix this, either run `uv sync --dev` from the command line or
133-
use VSCode directly to run `uv` and reload:
134-
135-
1. Run the setup task:
136-
137-
- Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on macOS)
138-
139-
- Type "Tasks: Run Task"
140-
141-
- Select **"IQB: Setup Development Environment"**
142-
143-
- This runs `uv sync --dev` to install all development dependencies
144-
145-
2. After setup completes, reload VSCode:
146-
147-
- Press `Ctrl+Shift+P` → "Developer: Reload Window"
148-
149-
- The Ruff error should disappear
150-
151-
#### Available Tasks
152-
153-
Access them via `Ctrl+Shift+P` → "Tasks: Run Task":
154-
155-
- **IQB: Setup Development Environment** - Run `uv sync --dev` to install/update dependencies
156-
157-
- **IQB: Run Tests** - Run the pytest test suite
158-
159-
- **IQB: Run Ruff Check** - Check code style and quality
160-
161-
- **IQB: Run Pyright** - Run type checking
162-
163-
#### Extensions
164-
165-
VSCode will prompt to install these extensions:
166-
167-
- Python (`ms-python.python`)
168-
169-
- Pylance (`ms-python.vscode-pylance`)
170-
171-
- Ruff (`charliermarsh.ruff`)
114+
See [CONTRIBUTING.md](CONTRIBUTING.md) for full development environment
115+
setup, VSCode configuration, and component-specific workflows.

docs/README.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
## Documentation and Presentations
22

3-
This directory contains documentations and presentations.
3+
- [design/](design/): Architecture decision records explaining **why**
4+
the system was built this way — cache format, remote cache strategy,
5+
data distribution, parallelization.
46

5-
- [design/](design/): design documents capturing architectural decisions,
6-
requirements analyses, technical evaluations, etc.
7-
8-
- [internals/](internals/): internal architecture documentation for
9-
the IQB data pipeline, organized as a sequence of chapters covering
10-
BigQuery queries, the `IQBPipeline`, and the remote cache.
7+
- [internals/](internals/): Sequential guide explaining **how** the
8+
data pipeline works — start here if you are new to the codebase.
119

1210
- [2025-12-11-pulse-slides.pdf](2025-12-11-pulse-slides.pdf): slides
1311
of [@sermpezis](https://github.com/sermpezis) during the [ISOC

docs/design/2025-12-21-remote.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
**Date:** 2025-12-21
44
**Status:** Implemented (PRs #85--92, v0.5.0)
5+
**Builds on:** [2025-11-24-cache.md](2025-11-24-cache.md) (local cache design)
56

67
## Problem
78

docs/design/2026-01-20-distribution.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
**Date:** 2026-01-20
44
**Status:** Implemented (GCS bucket `mlab-sandbox-iqb-us-central1`, PR #131)
5+
**Builds on:** [2025-12-21-remote.md](2025-12-21-remote.md) (layered cache lookup)
56

67
## Requirements
78

docs/design/2026-01-29-sync.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
**Date:** 2026-01-29
44
**Status:** Implemented (ThreadPoolExecutor + JSONL metrics in `iqb cache pull`, PR #141)
5+
**Builds on:** [2026-01-20-distribution.md](2026-01-20-distribution.md) (data distribution)
56

67
## Problem
78

library/README.md

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,36 @@ uv sync --dev
2222
## Usage
2323

2424
```python
25-
from iqb import IQB
25+
from iqb import IQBCache, IQBCalculator, IQBDatasetGranularity, IQBRemoteCache
2626

27-
# Create an IQB instance
28-
iqb = IQB(name='my_analysis')
27+
# Initialize cache (downloads data from GCS if not available locally)
28+
cache = IQBCache(remote_cache=IQBRemoteCache())
2929

30-
# Calculate IQB score with default data
31-
score = iqb.calculate_iqb_score()
32-
print(f'IQB score: {score}')
30+
# Initialize calculator with default IQB configuration
31+
calculator = IQBCalculator()
3332

34-
# Calculate with detailed output
35-
score = iqb.calculate_iqb_score(print_details=True)
33+
# Load cached measurement data for US, October 2025
34+
entry = cache.get_cache_entry(
35+
start_date="2025-10-01",
36+
end_date="2025-11-01",
37+
granularity=IQBDatasetGranularity.COUNTRY,
38+
)
39+
40+
# Read M-Lab data filtered to the US
41+
df_pair = entry.mlab.read_data_frame_pair(country_code="US")
3642

37-
# Print configuration
38-
iqb.print_config()
43+
# Extract the 50th percentile and convert for the calculator
44+
p50 = df_pair.to_iqb_data(percentile=50)
45+
data = {"m-lab": p50.to_dict()}
46+
47+
# Calculate IQB score
48+
score = calculator.calculate_iqb_score(data=data)
49+
print(f"IQB score: {score:.3f}")
3950
```
4051

52+
See [analysis/00-template.ipynb](../analysis/00-template.ipynb) for a
53+
complete walkthrough with step-by-step explanations.
54+
4155
## Command-Line Interface
4256

4357
The library provides an `iqb` command-line tool. Run `uv run iqb --help`
@@ -190,12 +204,11 @@ naming pattern `*_test.py`:
190204

191205
```python
192206
"""tests/my_feature_test.py"""
193-
import pytest
194-
from iqb import IQB
207+
from iqb import IQBCalculator
195208

196209
class TestMyFeature:
197210
def test_something(self):
198-
iqb = IQB()
211+
calculator = IQBCalculator()
199212
# Your test code here
200213
assert True
201214
```

prototype/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,15 @@ this directory, Streamlit will reload on save.
5858
```
5959
prototype/
6060
├── Home.py # Main Streamlit entry point
61+
├── app_state.py # Application state management
62+
├── session_state.py # Session state management
63+
├── pages/ # Streamlit multi-page app pages
64+
├── cache/ # Static data cache (JSON files per country)
65+
├── utils/ # Helpers (data loading, calculations, constants)
66+
├── visualizations/ # Chart and UI components (sunburst, etc.)
67+
├── natural_earth/ # GeoJSON extraction for map visualizations
6168
├── pyproject.toml # Dependencies (streamlit, pandas, mlab-iqb)
69+
├── Dockerfile # Container image for Cloud Run
6270
└── README.md # This file
6371
```
6472

0 commit comments

Comments
 (0)