Skip to content

Commit 2ecdf3e

Browse files
Add telemetry documentation for HEDit
- Add telemetry.md explaining data collection, privacy, and opt-out options - Add telemetry page to mkdocs navigation
1 parent 715e6cb commit 2ecdf3e

File tree

2 files changed

+105
-0
lines changed

2 files changed

+105
-0
lines changed

docs/projects/hedit/telemetry.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Telemetry
2+
3+
HEDit collects anonymous telemetry data to help improve the service and support research on annotation quality. This page explains what data we collect, how it's used, and how you can opt out.
4+
5+
## Why We Collect Telemetry
6+
7+
Telemetry helps us:
8+
9+
1. **Improve annotation quality**: By analyzing successful and failed annotations, we can identify patterns and improve the underlying models
10+
2. **Train better models**: Anonymized annotation pairs (description + HED string) can be used to fine-tune models for better HED annotation generation
11+
3. **Track service health**: Understanding usage patterns helps us maintain and scale the service appropriately
12+
4. **Support research**: Aggregated, anonymized data may be used for academic research on natural language to HED annotation translation
13+
14+
## What We Collect
15+
16+
When telemetry is enabled, we collect:
17+
18+
| Data | Description |
19+
|------|-------------|
20+
| Input description | The natural language event description you provided |
21+
| Generated HED string | The HED annotation that was generated |
22+
| Schema version | Which HED schema version was used (e.g., 8.3.0) |
23+
| Validation iterations | How many validation attempts were needed |
24+
| Validation errors | Any validation errors encountered (for debugging) |
25+
| Model configuration | Which LLM model was used and its settings |
26+
| Latency | How long the request took (for performance monitoring) |
27+
| Source | Whether the request came from CLI, API, or web interface |
28+
29+
### Input Hashing for Deduplication
30+
31+
To avoid storing duplicate data, we hash each input description using SHA-256 and store only the first 16 characters. This allows us to detect duplicates without storing the full hash.
32+
33+
## What We Do NOT Collect
34+
35+
We are committed to user privacy. We explicitly **do not** collect:
36+
37+
- **IP addresses**: Your network location is never logged or stored
38+
- **User identifiers**: We do not track individual users across requests
39+
- **API keys**: Your OpenRouter or other API keys are never logged
40+
- **Personal information**: No names, emails, or identifying information
41+
- **Session tracking**: We do not use cookies or track sessions
42+
- **Geographic data**: No location information is collected
43+
- **Device fingerprints**: No browser or device identification
44+
45+
Each telemetry event is independent and cannot be linked to previous requests or to you personally.
46+
47+
## Data Storage and Security
48+
49+
- Telemetry data is stored in Cloudflare Workers KV (production) or local files (development)
50+
- Data is encrypted in transit using HTTPS
51+
- Access to telemetry data is restricted to project maintainers
52+
- Data may be aggregated and anonymized for public research publications
53+
54+
## How to Opt Out
55+
56+
### Web Interface
57+
58+
Click the "Allow" checkbox in the footer of the HEDit web interface to disable telemetry. Your preference is saved in your browser's local storage.
59+
60+
### CLI
61+
62+
Use the `--no-telemetry` flag with any command:
63+
64+
```bash
65+
hedit annotate "A red circle appears" --no-telemetry
66+
```
67+
68+
Or disable telemetry permanently in your configuration:
69+
70+
```bash
71+
hedit config set telemetry_enabled false
72+
```
73+
74+
### API
75+
76+
Include `telemetry_enabled: false` in your request body:
77+
78+
```json
79+
{
80+
"description": "A red circle appears on the screen",
81+
"telemetry_enabled": false
82+
}
83+
```
84+
85+
## Data Retention
86+
87+
- Raw telemetry data is retained for up to 12 months
88+
- Aggregated statistics may be retained indefinitely
89+
- You may request deletion of any data associated with your inputs by contacting us
90+
91+
## Open Source
92+
93+
The telemetry implementation is fully open source. You can review the code at:
94+
95+
- [Telemetry Schema](https://github.com/Annotation-Garden/HEDit/blob/main/src/telemetry/schema.py)
96+
- [Telemetry Collector](https://github.com/Annotation-Garden/HEDit/blob/main/src/telemetry/collector.py)
97+
- [Storage Backends](https://github.com/Annotation-Garden/HEDit/blob/main/src/telemetry/storage.py)
98+
99+
## Questions?
100+
101+
If you have questions about telemetry or data privacy, please:
102+
103+
- Open an issue on [GitHub](https://github.com/Annotation-Garden/HEDit/issues)
104+
- Contact the maintainers at info@annotation.garden

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ nav:
157157
- CLI Reference: projects/hedit/cli-reference.md
158158
- API Reference: projects/hedit/api-reference.md
159159
- Python API: projects/hedit/python-api.md
160+
- Telemetry: projects/hedit/telemetry.md
160161
- Image Annotation:
161162
- projects/image-annotation/index.md
162163
- API Reference: projects/image-annotation/api-reference.md

0 commit comments

Comments
 (0)