Skip to content

Commit f1d4388

Browse files
committed
docs: restructure README with ToC, how-it-works, and official IAB links
1 parent fc9281d commit f1d4388

File tree

1 file changed

+59
-4
lines changed

1 file changed

+59
-4
lines changed

README.md

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,30 @@
1010
<a href="https://github.com/mixpeek/iab-mapper/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/license-MIT-blue.svg"></a>
1111
</p>
1212

13-
Map **IAB Content Taxonomy 2.x** labels/codes to **IAB 3.0** locally with a deterministicfuzzy(optional) local-embeddings pipeline.
14-
Outputs are **IAB-3.0–compatible IDs** suitable for OpenRTB/VAST, with optional **vector attributes** (Channel, Type, Format, Language, Source, Environment) and **SCD** awareness.
13+
Map **IAB Content Taxonomy 2.x** labels/codes to **IAB 3.0** locally with a deterministicfuzzy(optional) semantic pipeline.
14+
Outputs are **IAB3.0–compatible IDs** for OpenRTB/VAST, with optional **vector attributes** (Channel, Type, Format, Language, Source, Environment) and **SCD** awareness.
1515

16-
> No external APIs. Runs fully local. LLMs are **not required**. You can enable local embeddings for tougher matches.
16+
> Local-first by default. No external APIs are required; LLM re‑rank is optional.
17+
18+
---
19+
20+
## 📚 Table of Contents
21+
22+
- [✨ Features](#-features)
23+
- [Why migrate to IAB 3.0?](#-why-migrate-to-iab-30)
24+
- [How it works](#-how-it-works)
25+
- [🔧 Install](#-install)
26+
- [🚀 Quick Start](#-quick-start)
27+
- [🐍 Python API](#-python-api-alternative-to-cli)
28+
- [📥 Input Formats](#-input-formats)
29+
- [📤 Output Formats](#-output-formats)
30+
- [⚙️ Useful Flags](#️-useful-flags)
31+
- [🧩 Vectors](#-vectors-orthogonal-attributes)
32+
- [✅ IAB 3.0 Conformance Notes](#-iab-30-conformance-notes)
33+
- [📎 Official IAB References](#-official-iab-references)
34+
- [🧯 Troubleshooting](#-troubleshooting)
35+
- [📦 Example Commands](#-example-commands)
36+
- [📜 License](#-license)
1737

1838
---
1939

@@ -27,6 +47,27 @@ Outputs are **IAB-3.0–compatible IDs** suitable for OpenRTB/VAST, with optiona
2747

2848
---
2949

50+
## 🔎 Why migrate to IAB 3.0?
51+
52+
- 3.0 introduces clearer separation of primary topic “aboutness” vs. orthogonal vectors (e.g., news vs. opinion, formats, channels).
53+
- Better support for CTV/video, podcasts, games, and app stores.
54+
- Non‑backwards compatible in areas like News/Opinion and entertainment genres; careful migration is required.
55+
56+
This tool makes migration practical: it emits valid 3.0 IDs and helps curate edge cases with overrides, synonyms, thresholds, and audit outputs.
57+
58+
---
59+
60+
## 🧠 How it works
61+
62+
1) Normalize text and apply alias/exact matches via synonyms.
63+
2) Fuzzy retrieval (rapidfuzz | TF‑IDF | BM25) with configurable thresholds.
64+
3) Optional semantic augmentation with local embeddings (Sentence‑Transformers or TF‑IDF KNN).
65+
4) Optional local LLM re‑ranking (Ollama) for ordering only.
66+
5) Assemble outputs: topic IDs + vector IDs → OpenRTB `content.cat` with configurable `cattax`.
67+
6) SCD flags are surfaced and can be excluded with `--drop-scd`.
68+
69+
---
70+
3071
## 🔧 Install
3172

3273
### From PyPI (recommended)
@@ -251,6 +292,18 @@ Each value maps to a **stable IAB 3.0 ID** that is appended to the `cat` array.
251292
252293
---
253294

295+
## 📎 Official IAB References
296+
297+
- Content Taxonomy 3.0 Implementation Guide (PDF): `https://iabtechlab.com/wp-content/uploads/2021/09/Implementation-Guide-Content-Taxonomy-3-0-pc-Sept2021.pdf`
298+
- IAB Tech Lab Content Taxonomy page: `https://iabtechlab.com/standards/content-taxonomy/`
299+
- Implementation guidance (historic mappings and migration notes):
300+
- `https://github.com/InteractiveAdvertisingBureau/Taxonomies/blob/develop/implementation.md#content-21-to-ad-product-20-taxonomy-mapping-implementation-guidance`
301+
- `https://github.com/InteractiveAdvertisingBureau/Taxonomies/blob/develop/Taxonomy%20Mappings/Ad%20Product%202.0%20to%20Content%202.1.tsv`
302+
- `https://github.com/katieshell/Taxonomies/blob/main/implementation.md#implementation-guidance-for-content-1--content-2-mapping`
303+
- `https://github.com/katieshell/Taxonomies/blob/main/implementation.md#migrating-from-content-taxonomy-10`
304+
305+
---
306+
254307
## 🔬 Evaluation (recommended)
255308
Create a small gold set for your domain and run periodic checks:
256309
```bash
@@ -293,5 +346,7 @@ mixpeek-iab-mapper sample_2x_codes.csv -o mapped.json --use-embeddings --drop-sc
293346
---
294347

295348
## 📜 License
296-
TBD by Mixpeek. Include IAB attribution in your deployed UI/footer:
349+
MIT. See `LICENSE`.
350+
351+
Include IAB attribution in your deployed UI/footer:
297352
> “IAB is a registered trademark of the Interactive Advertising Bureau. This tool is an independent utility built by Mixpeek for interoperability with IAB Content Taxonomy standards.”

0 commit comments

Comments
 (0)