Doc B: Example Claims — Quality-Optimized Worked Examples (v2) #1536

OAGr · 2026-03-02T20:24:30Z

OAGr
Mar 2, 2026
Maintainer

Doc B: Example Claims — Quality-Optimized Worked Examples (v2)

Companion to Doc A: Claims & Verification — Domain Model (#1532).

This version uses the Doc A v7 schema (Properties table with controlled vocabulary, value_squiggle for distributions, qualifier_key as a first-class field, Phase 1 five-table model).

Each example is graded on six quality dimensions: Precision · Resolvability · Uniqueness · Importance · Categorization · Clarity. ✓ = passes. ⚠ = marginal. ✗ = fails.

The goal: a large body of claims that score ✓ on every dimension. Examples that score ✗ on any dimension show what the ingest pipeline should reject or transform.

Quality Dimension Reference

Dimension	Question	Failure triggers rejection
Precision	Is the value atomic and fully quantified (not a range disguised as a number, not vague)?	"around $40M", "significant growth"
Resolvability	Can a human verify this from the cited sources in <5 minutes?	Unpublished data, dead links, paywalled with no abstract
Uniqueness	Is this claim distinct from all other claims currently in the system?	Exact duplicate, near-duplicate with wrong identity
Importance	Does this fact matter for understanding the subject?	Trivial detail, only relevant inside a footnote
Categorization	Does `property_id` exist in the properties vocabulary with the correct `value_type`?	Free-text property, wrong value_type
Clarity	Is `claim_text` self-contained and unambiguous to a reader without context?	"It was $300B at that time", missing entity/date

Part 1: High-Scoring Examples — What We Want

Example 1 — Funding Round (Anthropic Series E)

claim_text: "Anthropic raised $7.3B in its Series E funding round in March 2024"

Table	Field	Value
`claims`	`subject_entity_id`	`anthropic`
`claims`	`claim_variety`	`structured`
`claims`	`claim_text`	"Anthropic raised $7.3B in its Series E funding round in March 2024"
`claims`	`valid_start`	2024-03-01
`claims`	`temporal_granularity`	`month`
`claims`	`status`	`active`
`claim_structured`	`property_id`	`funding-round-amount`
`claim_structured`	`qualifier_key`	`Series-E`
`claim_structured`	`value_numeric`	7300000000
`claim_structured`	`value_squiggle`	null (exact figure — no uncertainty)
`claim_structured`	`value_unit`	`USD`
`resources`	`url`	https://anthropic.com/news/anthropic-raises-series-e
`claim_sources`	`source_quote`	"Anthropic has raised $7.3 billion in new funding…"
`claim_sources`	`is_primary`	true

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

P ✓: Exact dollar figure; press release uses $7.3B, claim matches
R ✓: Primary source is Anthropic's own press release — 30-second verification
U ✓: qualifier_key = Series-E distinguishes from Series B, Series C, etc.
I ✓: Largest fundraise in AI safety sector as of that date; essential for any Anthropic overview
C ✓: funding-round-amount is a defined property (Financial category, numeric, USD)
Cl ✓: Entity, amount, round name, and month all present in claim_text

Example 2 — Atemporal Biographical Fact

claim_text: "Anthropic was founded in May 2021"

Table	Field	Value
`claims`	`subject_entity_id`	`anthropic`
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	null (atemporal)
`claims`	`valid_end`	null
`claim_structured`	`property_id`	`founding-date`
`claim_structured`	`qualifier_key`	null
`claim_structured`	`value_date`	2021-05-01
`claim_structured`	`value_squiggle`	null
`resources`	`url`	https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001870066
`claim_sources`	`source_quote`	"Incorporation date: 05/2021"

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

P ✓: Month-precision — "May 2021" is the verifiable grain (day unknown from public sources)
R ✓: SEC EDGAR is public and machine-readable; verification < 1 min
U ✓: Only one founding date per organization; identity is trivially unique
I ✓: Canonical fact, appears in virtually every Anthropic overview
C ✓: founding-date defined (Organizational, date type)
Cl ✓: Entity and date explicit in claim_text; no context required

Note on granularity: temporal_granularity is null because this claim is atemporal — the founding date is itself a date value, not a time window during which the claim is true.

Example 3 — Technical Spec (Exact, Documented)

claim_text: "Claude 3 Opus has a context window of 200,000 tokens, as of its March 2024 release"

Table	Field	Value
`claims`	`subject_entity_id`	`claude-3-opus`
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	2024-03-04
`claims`	`temporal_granularity`	`day`
`claim_structured`	`property_id`	`context-window-length`
`claim_structured`	`value_numeric`	200000
`claim_structured`	`value_unit`	`tokens`
`claim_structured`	`value_squiggle`	null (documented specification, no uncertainty)
`resources`	`url`	https://www.anthropic.com/claude/claude-3-model-card
`claim_sources`	`source_quote`	"Context window: 200K tokens"

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

P ✓: Exact integer — model cards define context windows to the token
R ✓: Model card is public; field is a table row
I ✓: Context window is the primary capability differentiator between model releases

Example 4 — Uncertain Estimate with Squiggle Distribution

claim_text: "Anthropic had approximately 1,200–1,500 employees as of Q1 2025"

Table	Field	Value
`claims`	`subject_entity_id`	`anthropic`
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	2025-01-01
`claims`	`temporal_granularity`	`quarter`
`claim_structured`	`property_id`	`employee-count`
`claim_structured`	`value_numeric`	1350 (median of distribution — for SQL sorting)
`claim_structured`	`value_squiggle`	`"900 to 1800"` (log-normal 90% CI)
`claim_structured`	`value_unit`	`FTE`
`resources`	`url`	https://www.theinformation.com/...
`claim_sources`	`source_quote`	"Anthropic has around 1,200 to 1,500 employees, according to people familiar with the matter"

Quality: P=✓ R=⚠ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT (with note)

P ✓: The distribution IS the precise representation — Squiggle captures the uncertainty faithfully. Storing 1350 alone would be false precision.
R ⚠: Primary source is paywalled (The Information). Resolvability is reduced. Mitigated by the source_quote field — verifiable via quote even without full access.
I ✓: Headcount is a key organizational signal — indicative of growth trajectory, resource allocation, and safety capacity

Why Squiggle instead of point estimate: The source explicitly says "around 1,200 to 1,500" — the uncertainty is in the source, not in our reading of it. Storing 1350 as a point estimate would fabricate precision. value_squiggle = "900 to 1800" represents a wider 90% CI: even the source's range is itself an estimate.

Example 5 — Order-of-Magnitude Estimate (AI Compute)

claim_text: "GPT-4's training required approximately 2×10²⁴ floating point operations (FLOP)"

Table	Field	Value
`claims`	`subject_entity_id`	`gpt-4`
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	2023-03-14 (release date; compute is fixed post-training)
`claims`	`temporal_granularity`	`day`
`claim_structured`	`property_id`	`training-compute`
`claim_structured`	`value_numeric`	2e24
`claim_structured`	`value_squiggle`	`"lognormal({p5: 5e23, p95: 1e25})"`
`claim_structured`	`value_unit`	`FLOP`
`resources`	`url`	https://epochai.org/data/notable-ai-models
`claim_sources`	`source_quote`	"GPT-4: 2.15e24 FLOP (preliminary estimate)"

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

P ✓: A lognormal distribution IS precision here — "1e24–1e25" is approximately right; a point estimate of 2e24 would imply false exactness
R ✓: Epoch AI publishes methodology and shows their work; verifiable via the dataset
I ✓: Compute is the most important input proxy for AI capability; cited in policy documents, scaling law papers, and safety arguments
Squiggle note: Compute estimates span an order of magnitude. Lognormal is the right distribution shape (multiplicative uncertainty, always positive). The 90% CI represents genuine epistemic uncertainty about a private number inferred from model characteristics.

Example 6 — Community Forecast as Structured Claim

claim_text: "Metaculus community forecast for P(transformative AI by 2030): 18% as of January 2025"

Table	Field	Value
`claims`	`subject_entity_id`	`agi-arrival` (concept entity)
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	2025-01-15
`claims`	`temporal_granularity`	`day`
`claim_structured`	`property_id`	`p-agi-by-year`
`claim_structured`	`qualifier_key`	`2030`
`claim_structured`	`value_numeric`	0.18
`claim_structured`	`value_squiggle`	`"0.12 to 0.26"` (Metaculus 80% CI on the community estimate)
`claim_structured`	`value_unit`	`probability`
`resources`	`url`	https://www.metaculus.com/questions/...
`claim_sources`	`source_quote`	"Community prediction: 18% (80% CI: 12%–26%)"

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

P ✓: The value IS a probability; the Squiggle CI represents the community's own stated uncertainty, not our imprecision
qualifier_key = "2030" distinguishes "P(AGI by 2030)" from "P(AGI by 2035)" — both valid distinct claims
I ✓: P(AGI by year) is the most-cited summary statistic in AI safety discussions; appears in policy documents, forecasting reviews, and public communications

Example 7 — Attributed Statement (Opinion/Prediction)

claim_text: "Dario Amodei stated in October 2024 that AI systems could make Nobel Prize-level scientific discoveries within 1–2 years"

Table	Field	Value
`claims`	`subject_entity_id`	`dario-amodei`
`claims`	`claim_variety`	`attributed`
`claims`	`attributed_to`	`dario-amodei`
`claims`	`claim_text`	"Dario Amodei stated in October 2024 that AI systems could make Nobel Prize-level scientific discoveries within 1–2 years"
`claims`	`valid_start`	2024-10-01
`claims`	`temporal_granularity`	`month`
`resources`	`url`	https://darioamodei.com/machines-of-loving-grace
`claim_sources`	`source_quote`	"within 1-2 years of achieving powerful AI, we may have systems that can make Nobel Prize-level discoveries"
`claim_sources`	`is_primary`	true

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

No claim_structured row — attributed statements don't have structured values
P ✓: The content is exactly as stated — we impose no normalization
Verification goal: "Did Dario say this?" — confirmed by primary source quote. We do NOT verify whether Claude will actually make Nobel discoveries.
I ✓: A major public statement from the CEO of a leading AI lab on AI capabilities timelines; widely cited

Example 8 — Leadership Role (with temporal boundary on prior)

Two claims together: the current fact and the historical record.

claim_text A (active): "Sam Altman is CEO of OpenAI (reinstated November 22, 2023)"

Table	Field	Value
`claims`	`subject_entity_id`	`sam-altman`
`claims`	`claim_variety`	`structured`
`claims`	`valid_start`	2023-11-22
`claims`	`valid_end`	null
`claim_structured`	`property_id`	`leadership-role`
`claim_structured`	`qualifier_key`	`openai-ceo`
`claim_structured`	`value_entity_id`	`openai` (the organization they lead)

claim_text B (archived): "Emmett Shear was interim CEO of OpenAI from November 18–22, 2023"

Field	Value
`valid_start`	2023-11-18
`valid_end`	2023-11-22
`status`	`archived`
`archive_reason`	`superseded`
`qualifier_key`	`openai-ceo`

Quality (for A): P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

qualifier_key = "openai-ceo" is what makes A and B uniquely identified — without it, the claims identity (sam-altman, leadership-role, 2023-11-22) would collide with other leadership roles at other organizations
The archived B claim is historically accurate — its archive_reason = superseded documents that it ended, not that it was wrong

Example 9 — Research Finding (Paper Result, Asserted)

claim_text: "InstructGPT (1.3B parameters) was preferred over GPT-3 (175B parameters) by human raters on 85% of prompts, per Ouyang et al. 2022"

Table	Field	Value
`claims`	`subject_entity_id`	`rlhf` (concept entity — claim is about RLHF as a technique)
`claims`	`claim_variety`	`structured`
`claims`	`attributed_to`	null (asserted: wiki vouches for correctly reading the paper)
`claims`	`valid_start`	null (atemporal: research findings don't expire)
`claim_structured`	`property_id`	`research-finding-pct`
`claim_structured`	`qualifier_key`	`instructgpt-vs-gpt3-human-preference`
`claim_structured`	`value_numeric`	85
`claim_structured`	`value_unit`	`percent`
`resources`	`url`	https://arxiv.org/abs/2203.02155
`claim_sources`	`source_quote`	"Labelers significantly prefer InstructGPT outputs over outputs from GPT-3 with an average win rate of 85 ± 2%"

Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT

attributed_to = null means we assert this is an accurate reading of the paper. If instead we stored "InstructGPT demonstrates that RLHF dramatically improves capability" — a gloss, not the number — that would require attributed_to = ouyang-et-al because we wouldn't be vouching for it ourselves.
I ✓: This finding is foundational to the entire RLHF literature and cited in hundreds of subsequent papers

Example 10 — Time Series as Inline Series

Instead of 5 separate claims, Anthropic's full funding history fits as one series claim:

claim_text: "Anthropic's funding rounds from 2021 to 2024"

Table	Field	Value
`claims`	`subject_entity_id`	`anthropic`
`claims`	`claim_variety`	`structured`
`claim_structured`	`property_id`	`funding-round-amount`
`claim_structured`	`value_series`	(see below)

[
  {"qualifier_key": "Seed",     "date": "2021-05-01", "value_numeric": 124000000,   "value_squiggle": null},
  {"qualifier_key": "Series-B", "date": "2022-04-01", "value_numeric": 580000000,   "value_squiggle": null},
  {"qualifier_key": "Series-C", "date": "2023-05-01", "value_numeric": 7500000000,  "value_squiggle": null},
  {"qualifier_key": "Series-D", "date": "2023-08-01", "value_numeric": 1250000000,  "value_squiggle": null},
  {"qualifier_key": "Series-E", "date": "2024-03-01", "value_numeric": 7300000000,  "value_squiggle": null}
]

When to use series vs. separate claims: Use separate claims when rounds will be individually cited, updated, or verified at different times. Use value_series when the whole history is a single artifact from a single source (e.g., Crunchbase profile). For Anthropic, separate claims are better — different rounds have different primary sources and staleness dates.

Part 2: Transformations — Bad Input → Good Claim

These show what the ingest pipeline does when proposed claims fail quality checks.

Transform 1 — Vague Quantity

Input: "OpenAI will make around $40M in early 2025"

Dimension	Score	Problem
Precision	✗	"around $40M" and "early 2025" are both vague
Clarity	⚠	"early 2025" is ambiguous — Q1? H1?

Pipeline action:

Identify source: Bloomberg analyst estimate from January 2025
The source gives "$3.4B annualized" for H1 2025 → implies roughly $1.7B for H1 → ~$850M for Q1
If the source has a specific figure: transform to structured claim: property=revenue-quarterly, value_numeric=850000000, value_squiggle="650000000 to 1100000000", qualifier_key=Q1-2025
If source is genuinely vague: store as attributed statement: "Bloomberg reported in January 2025 that OpenAI expected several hundred million dollars in Q1 2025 revenue"

Key principle: don't store vague quantities in value_numeric. Either decompose to a specific (possibly uncertain) value or route to attributed statement.

Transform 2 — Comparative Claim

Input: "OpenAI has a much higher valuation than Anthropic"

Dimension	Score	Problem
Precision	✗	No value, no date, no quantification
Categorization	✗	No property maps to "relative standing"

Pipeline action: Reject as a structured claim. Create or verify two independent structured claims:

"OpenAI's post-money valuation: $157B, October 2024" → market-valuation, value_numeric=157000000000
"Anthropic's post-money valuation: $61.5B, 2025" → market-valuation, value_numeric=61500000000

The wiki page prose then says: "As of early 2025, OpenAI's valuation ($157B) is roughly 2.5× Anthropic's ($61.5B)." The comparison is derived at display time from two precise facts, not stored.

Transform 3 — Attributed Prediction Masquerading as Fact

Input: "AI will reach human-level performance on most cognitive tasks by 2030"

Dimension	Score	Problem
Precision	✗	"human-level" and "most cognitive tasks" are undefined
Resolvability	✗	No source — and even with a source, the claim is an assertion about the future

Pipeline action:

If from a specific person/org: store as attributed statement ("Researcher X stated that AI will reach human-level performance...")
If meant as the wiki's own assertion: reject — we don't make forward-looking assertions without quantification
If meant as a community aggregate: find the Metaculus or Polymarket question and store as: property=p-hlai-by-year, qualifier_key=2030, value_squiggle="0.20 to 0.55" (community CI)

Transform 4 — Overly Specific Metadata

Input: "Anthropic's San Francisco office is at 760 Market Street"

Dimension	Score	Problem
Importance	✗	Office address is not meaningful for understanding Anthropic
Resolvability	⚠	Verifiable but trivial to verify precisely because trivial

Pipeline action: Reject. Store headquarters-location = "San Francisco, CA" at city-level precision instead, if not already stored. The street address is below the importance threshold for the properties vocabulary.

Part 3: Rejections — What Should Not Be Stored

Rejection 1 — Normative Claim

Input: "Anthropic is the most safety-focused major AI lab"

Why rejected: Opinion/evaluation. No property maps to "relative safety-focus." Cannot be verified from sources — any source that says this is expressing a position, not a fact. If Anthropic or a credible analyst says this: store as attributed statement. If wiki editors believe it: express in prose with a footnote to evidence, not as a structured claim.

Rejection 2 — Inference Chain

Input: "Anthropic's $7.3B Series E makes it likely to remain a leading AI lab through 2026"

Why rejected: Causal inference from a fact to a prediction. Even if true, this involves a chain of reasoning: funding → operational runway → capacity to compete → "leading." Each link is contestable. The underlying fact ("raised $7.3B") should be stored (and is — Example 1). The inference stays in wiki prose.

Rejection 3 — Aggregate Characterization

Input: "The AI industry is experiencing rapid growth"

Why rejected: Too broad, no entity, no property, no value. If there's a specific measure: e.g., "Global AI investment was $91.9B in 2022 (Stanford HAI)" → that IS storable. The characterization is derivable from specific facts; the characterization itself isn't.

Rejection 4 — Duplicate

Input: "Anthropic's employee count is around 1,500 in early 2025" (when Example 4 already exists)

Why rejected: Near-duplicate. Identity (anthropic, employee-count, Q1-2025) already exists. The ingest pipeline's uniqueness check should catch this before attempting insertion. The only action is to verify whether the new source (if different) should be added as an additional claim_sources entry to the existing claim.

Cross-Cutting Observations

1. Squiggle is appropriate for ~20% of numeric structured claims

Of the numeric examples above: funding rounds, context windows, research findings, and founding dates have no meaningful uncertainty → value_squiggle = null. Employee counts, compute estimates, and probability forecasts have genuine uncertainty → value_squiggle populated. The 80/20 rule applies: most structured facts are well-sourced point estimates; the distribution is for the genuinely uncertain minority.

2. `qualifier_key` solves identity for repeated properties

Without qualifier_key, identity for (anthropic, funding-round-amount) is ambiguous — there are five distinct funding rounds. qualifier_key = "Series-E" disambiguates. This is a pattern for any property that can occur multiple times: leadership roles at different orgs, model releases, funding rounds, policy positions.

3. The attributed/asserted distinction drives verification routing

Examples 7 vs. 9 demonstrate the fork: Example 9 uses attributed_to = null (wiki asserts 85% is the correct reading of the paper). Example 7 uses attributed_to = dario-amodei (wiki only asserts Dario said it). A future verification pipeline uses this field to route: asserted → fact-check; attributed → quote-check. This is the most important single field for verification correctness.

4. High-importance claims are the only ones worth automating

The quality dimension "Importance" is the most subjective and the hardest for LLMs to assess. The properties vocabulary implicitly encodes importance: if a property exists in the vocabulary (funding-round-amount, training-compute, employee-count), it's already been judged important enough to track. Claims using undefined properties are evidence that either the property should be added to the vocabulary, or the claim isn't important enough to store.

5. Rejection is the default; acceptance is the exception

The transformation and rejection examples show that most candidate claims from extraction need work. The ingest pipeline's job is not to pass claims through — it's to either make them precise enough to store, route them to attributed statements, or reject them. A well-running pipeline might accept 30% of raw candidate claims as structured facts on first pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc B: Example Claims — Quality-Optimized Worked Examples (v2) #1536

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Doc B: Example Claims — Quality-Optimized Worked Examples (v2) #1536

Uh oh!

Uh oh!

OAGr Mar 2, 2026 Maintainer

Doc B: Example Claims — Quality-Optimized Worked Examples (v2)

Quality Dimension Reference

Part 1: High-Scoring Examples — What We Want

Example 1 — Funding Round (Anthropic Series E)

Example 2 — Atemporal Biographical Fact

Example 3 — Technical Spec (Exact, Documented)

Example 4 — Uncertain Estimate with Squiggle Distribution

Example 5 — Order-of-Magnitude Estimate (AI Compute)

Example 6 — Community Forecast as Structured Claim

Example 7 — Attributed Statement (Opinion/Prediction)

Example 8 — Leadership Role (with temporal boundary on prior)

Example 9 — Research Finding (Paper Result, Asserted)

Example 10 — Time Series as Inline Series

Part 2: Transformations — Bad Input → Good Claim

Transform 1 — Vague Quantity

Transform 2 — Comparative Claim

Transform 3 — Attributed Prediction Masquerading as Fact

Transform 4 — Overly Specific Metadata

Part 3: Rejections — What Should Not Be Stored

Rejection 1 — Normative Claim

Rejection 2 — Inference Chain

Rejection 3 — Aggregate Characterization

Rejection 4 — Duplicate

Cross-Cutting Observations

1. Squiggle is appropriate for ~20% of numeric structured claims

2. qualifier_key solves identity for repeated properties

3. The attributed/asserted distinction drives verification routing

4. High-importance claims are the only ones worth automating

5. Rejection is the default; acceptance is the exception

Replies: 0 comments

OAGr
Mar 2, 2026
Maintainer

2. `qualifier_key` solves identity for repeated properties