You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This version uses the Doc A v7 schema (Properties table with controlled vocabulary, value_squiggle for distributions, qualifier_key as a first-class field, Phase 1 five-table model).
Each example is graded on six quality dimensions: Precision · Resolvability · Uniqueness · Importance · Categorization · Clarity. ✓ = passes. ⚠ = marginal. ✗ = fails.
The goal: a large body of claims that score ✓ on every dimension. Examples that score ✗ on any dimension show what the ingest pipeline should reject or transform.
Quality Dimension Reference
Dimension
Question
Failure triggers rejection
Precision
Is the value atomic and fully quantified (not a range disguised as a number, not vague)?
"around $40M", "significant growth"
Resolvability
Can a human verify this from the cited sources in <5 minutes?
Unpublished data, dead links, paywalled with no abstract
Uniqueness
Is this claim distinct from all other claims currently in the system?
Exact duplicate, near-duplicate with wrong identity
Importance
Does this fact matter for understanding the subject?
Trivial detail, only relevant inside a footnote
Categorization
Does property_id exist in the properties vocabulary with the correct value_type?
Free-text property, wrong value_type
Clarity
Is claim_text self-contained and unambiguous to a reader without context?
"It was $300B at that time", missing entity/date
Part 1: High-Scoring Examples — What We Want
Example 1 — Funding Round (Anthropic Series E)
claim_text: "Anthropic raised $7.3B in its Series E funding round in March 2024"
Table
Field
Value
claims
subject_entity_id
anthropic
claims
claim_variety
structured
claims
claim_text
"Anthropic raised $7.3B in its Series E funding round in March 2024"
P ✓: Month-precision — "May 2021" is the verifiable grain (day unknown from public sources)
R ✓: SEC EDGAR is public and machine-readable; verification < 1 min
U ✓: Only one founding date per organization; identity is trivially unique
I ✓: Canonical fact, appears in virtually every Anthropic overview
C ✓: founding-date defined (Organizational, date type)
Cl ✓: Entity and date explicit in claim_text; no context required
Note on granularity: temporal_granularity is null because this claim is atemporal — the founding date is itself a date value, not a time window during which the claim is true.
Example 3 — Technical Spec (Exact, Documented)
claim_text: "Claude 3 Opus has a context window of 200,000 tokens, as of its March 2024 release"
P ✓: The distribution IS the precise representation — Squiggle captures the uncertainty faithfully. Storing 1350 alone would be false precision.
R ⚠: Primary source is paywalled (The Information). Resolvability is reduced. Mitigated by the source_quote field — verifiable via quote even without full access.
I ✓: Headcount is a key organizational signal — indicative of growth trajectory, resource allocation, and safety capacity
Why Squiggle instead of point estimate: The source explicitly says "around 1,200 to 1,500" — the uncertainty is in the source, not in our reading of it. Storing 1350 as a point estimate would fabricate precision. value_squiggle = "900 to 1800" represents a wider 90% CI: even the source's range is itself an estimate.
Example 5 — Order-of-Magnitude Estimate (AI Compute)
claim_text: "GPT-4's training required approximately 2×10²⁴ floating point operations (FLOP)"
Table
Field
Value
claims
subject_entity_id
gpt-4
claims
claim_variety
structured
claims
valid_start
2023-03-14 (release date; compute is fixed post-training)
P ✓: A lognormal distribution IS precision here — "1e24–1e25" is approximately right; a point estimate of 2e24 would imply false exactness
R ✓: Epoch AI publishes methodology and shows their work; verifiable via the dataset
I ✓: Compute is the most important input proxy for AI capability; cited in policy documents, scaling law papers, and safety arguments
Squiggle note: Compute estimates span an order of magnitude. Lognormal is the right distribution shape (multiplicative uncertainty, always positive). The 90% CI represents genuine epistemic uncertainty about a private number inferred from model characteristics.
Example 6 — Community Forecast as Structured Claim
claim_text: "Metaculus community forecast for P(transformative AI by 2030): 18% as of January 2025"
Table
Field
Value
claims
subject_entity_id
agi-arrival(concept entity)
claims
claim_variety
structured
claims
valid_start
2025-01-15
claims
temporal_granularity
day
claim_structured
property_id
p-agi-by-year
claim_structured
qualifier_key
2030
claim_structured
value_numeric
0.18
claim_structured
value_squiggle
"0.12 to 0.26"(Metaculus 80% CI on the community estimate)
P ✓: The value IS a probability; the Squiggle CI represents the community's own stated uncertainty, not our imprecision
qualifier_key = "2030" distinguishes "P(AGI by 2030)" from "P(AGI by 2035)" — both valid distinct claims
I ✓: P(AGI by year) is the most-cited summary statistic in AI safety discussions; appears in policy documents, forecasting reviews, and public communications
Example 7 — Attributed Statement (Opinion/Prediction)
claim_text: "Dario Amodei stated in October 2024 that AI systems could make Nobel Prize-level scientific discoveries within 1–2 years"
Table
Field
Value
claims
subject_entity_id
dario-amodei
claims
claim_variety
attributed
claims
attributed_to
dario-amodei
claims
claim_text
"Dario Amodei stated in October 2024 that AI systems could make Nobel Prize-level scientific discoveries within 1–2 years"
qualifier_key = "openai-ceo" is what makes A and B uniquely identified — without it, the claims identity (sam-altman, leadership-role, 2023-11-22) would collide with other leadership roles at other organizations
The archived B claim is historically accurate — its archive_reason = superseded documents that it ended, not that it was wrong
Example 9 — Research Finding (Paper Result, Asserted)
claim_text: "InstructGPT (1.3B parameters) was preferred over GPT-3 (175B parameters) by human raters on 85% of prompts, per Ouyang et al. 2022"
Table
Field
Value
claims
subject_entity_id
rlhf(concept entity — claim is about RLHF as a technique)
claims
claim_variety
structured
claims
attributed_to
null (asserted: wiki vouches for correctly reading the paper)
"Labelers significantly prefer InstructGPT outputs over outputs from GPT-3 with an average win rate of 85 ± 2%"
Quality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
attributed_to = null means we assert this is an accurate reading of the paper. If instead we stored "InstructGPT demonstrates that RLHF dramatically improves capability" — a gloss, not the number — that would require attributed_to = ouyang-et-al because we wouldn't be vouching for it ourselves.
I ✓: This finding is foundational to the entire RLHF literature and cited in hundreds of subsequent papers
Example 10 — Time Series as Inline Series
Instead of 5 separate claims, Anthropic's full funding history fits as one series claim:
claim_text: "Anthropic's funding rounds from 2021 to 2024"
When to use series vs. separate claims: Use separate claims when rounds will be individually cited, updated, or verified at different times. Use value_series when the whole history is a single artifact from a single source (e.g., Crunchbase profile). For Anthropic, separate claims are better — different rounds have different primary sources and staleness dates.
Part 2: Transformations — Bad Input → Good Claim
These show what the ingest pipeline does when proposed claims fail quality checks.
Transform 1 — Vague Quantity
Input: "OpenAI will make around $40M in early 2025"
Dimension
Score
Problem
Precision
✗
"around $40M" and "early 2025" are both vague
Clarity
⚠
"early 2025" is ambiguous — Q1? H1?
Pipeline action:
Identify source: Bloomberg analyst estimate from January 2025
The source gives "$3.4B annualized" for H1 2025 → implies roughly $1.7B for H1 → ~$850M for Q1
If the source has a specific figure: transform to structured claim: property=revenue-quarterly, value_numeric=850000000, value_squiggle="650000000 to 1100000000", qualifier_key=Q1-2025
If source is genuinely vague: store as attributed statement: "Bloomberg reported in January 2025 that OpenAI expected several hundred million dollars in Q1 2025 revenue"
Key principle: don't store vague quantities in value_numeric. Either decompose to a specific (possibly uncertain) value or route to attributed statement.
Transform 2 — Comparative Claim
Input: "OpenAI has a much higher valuation than Anthropic"
Dimension
Score
Problem
Precision
✗
No value, no date, no quantification
Categorization
✗
No property maps to "relative standing"
Pipeline action: Reject as a structured claim. Create or verify two independent structured claims:
"OpenAI's post-money valuation: $157B, October 2024" → market-valuation, value_numeric=157000000000
The wiki page prose then says: "As of early 2025, OpenAI's valuation ($157B) is roughly 2.5× Anthropic's ($61.5B)." The comparison is derived at display time from two precise facts, not stored.
Transform 3 — Attributed Prediction Masquerading as Fact
Input: "AI will reach human-level performance on most cognitive tasks by 2030"
Dimension
Score
Problem
Precision
✗
"human-level" and "most cognitive tasks" are undefined
Resolvability
✗
No source — and even with a source, the claim is an assertion about the future
Pipeline action:
If from a specific person/org: store as attributed statement ("Researcher X stated that AI will reach human-level performance...")
If meant as the wiki's own assertion: reject — we don't make forward-looking assertions without quantification
If meant as a community aggregate: find the Metaculus or Polymarket question and store as: property=p-hlai-by-year, qualifier_key=2030, value_squiggle="0.20 to 0.55" (community CI)
Transform 4 — Overly Specific Metadata
Input: "Anthropic's San Francisco office is at 760 Market Street"
Dimension
Score
Problem
Importance
✗
Office address is not meaningful for understanding Anthropic
Resolvability
⚠
Verifiable but trivial to verify precisely because trivial
Pipeline action: Reject. Store headquarters-location = "San Francisco, CA" at city-level precision instead, if not already stored. The street address is below the importance threshold for the properties vocabulary.
Part 3: Rejections — What Should Not Be Stored
Rejection 1 — Normative Claim
Input: "Anthropic is the most safety-focused major AI lab"
Why rejected: Opinion/evaluation. No property maps to "relative safety-focus." Cannot be verified from sources — any source that says this is expressing a position, not a fact. If Anthropic or a credible analyst says this: store as attributed statement. If wiki editors believe it: express in prose with a footnote to evidence, not as a structured claim.
Rejection 2 — Inference Chain
Input: "Anthropic's $7.3B Series E makes it likely to remain a leading AI lab through 2026"
Why rejected: Causal inference from a fact to a prediction. Even if true, this involves a chain of reasoning: funding → operational runway → capacity to compete → "leading." Each link is contestable. The underlying fact ("raised $7.3B") should be stored (and is — Example 1). The inference stays in wiki prose.
Rejection 3 — Aggregate Characterization
Input: "The AI industry is experiencing rapid growth"
Why rejected: Too broad, no entity, no property, no value. If there's a specific measure: e.g., "Global AI investment was $91.9B in 2022 (Stanford HAI)" → that IS storable. The characterization is derivable from specific facts; the characterization itself isn't.
Rejection 4 — Duplicate
Input: "Anthropic's employee count is around 1,500 in early 2025" (when Example 4 already exists)
Why rejected: Near-duplicate. Identity (anthropic, employee-count, Q1-2025) already exists. The ingest pipeline's uniqueness check should catch this before attempting insertion. The only action is to verify whether the new source (if different) should be added as an additional claim_sources entry to the existing claim.
Cross-Cutting Observations
1. Squiggle is appropriate for ~20% of numeric structured claims
Of the numeric examples above: funding rounds, context windows, research findings, and founding dates have no meaningful uncertainty → value_squiggle = null. Employee counts, compute estimates, and probability forecasts have genuine uncertainty → value_squiggle populated. The 80/20 rule applies: most structured facts are well-sourced point estimates; the distribution is for the genuinely uncertain minority.
2. qualifier_key solves identity for repeated properties
Without qualifier_key, identity for (anthropic, funding-round-amount) is ambiguous — there are five distinct funding rounds. qualifier_key = "Series-E" disambiguates. This is a pattern for any property that can occur multiple times: leadership roles at different orgs, model releases, funding rounds, policy positions.
3. The attributed/asserted distinction drives verification routing
Examples 7 vs. 9 demonstrate the fork: Example 9 uses attributed_to = null (wiki asserts 85% is the correct reading of the paper). Example 7 uses attributed_to = dario-amodei (wiki only asserts Dario said it). A future verification pipeline uses this field to route: asserted → fact-check; attributed → quote-check. This is the most important single field for verification correctness.
4. High-importance claims are the only ones worth automating
The quality dimension "Importance" is the most subjective and the hardest for LLMs to assess. The properties vocabulary implicitly encodes importance: if a property exists in the vocabulary (funding-round-amount, training-compute, employee-count), it's already been judged important enough to track. Claims using undefined properties are evidence that either the property should be added to the vocabulary, or the claim isn't important enough to store.
5. Rejection is the default; acceptance is the exception
The transformation and rejection examples show that most candidate claims from extraction need work. The ingest pipeline's job is not to pass claims through — it's to either make them precise enough to store, route them to attributed statements, or reject them. A well-running pipeline might accept 30% of raw candidate claims as structured facts on first pass.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Doc B: Example Claims — Quality-Optimized Worked Examples (v2)
Quality Dimension Reference
property_idexist in the properties vocabulary with the correctvalue_type?claim_textself-contained and unambiguous to a reader without context?Part 1: High-Scoring Examples — What We Want
Example 1 — Funding Round (Anthropic Series E)
claim_text: "Anthropic raised $7.3B in its Series E funding round in March 2024"
claimssubject_entity_idanthropicclaimsclaim_varietystructuredclaimsclaim_textclaimsvalid_startclaimstemporal_granularitymonthclaimsstatusactiveclaim_structuredproperty_idfunding-round-amountclaim_structuredqualifier_keySeries-Eclaim_structuredvalue_numericclaim_structuredvalue_squiggleclaim_structuredvalue_unitUSDresourcesurlclaim_sourcessource_quoteclaim_sourcesis_primaryQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
qualifier_key = Series-Edistinguishes from Series B, Series C, etc.funding-round-amountis a defined property (Financial category, numeric, USD)Example 2 — Atemporal Biographical Fact
claim_text: "Anthropic was founded in May 2021"
claimssubject_entity_idanthropicclaimsclaim_varietystructuredclaimsvalid_startclaimsvalid_endclaim_structuredproperty_idfounding-dateclaim_structuredqualifier_keyclaim_structuredvalue_dateclaim_structuredvalue_squiggleresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
founding-datedefined (Organizational, date type)Note on granularity:
temporal_granularityis null because this claim is atemporal — the founding date is itself a date value, not a time window during which the claim is true.Example 3 — Technical Spec (Exact, Documented)
claim_text: "Claude 3 Opus has a context window of 200,000 tokens, as of its March 2024 release"
claimssubject_entity_idclaude-3-opusclaimsclaim_varietystructuredclaimsvalid_startclaimstemporal_granularitydayclaim_structuredproperty_idcontext-window-lengthclaim_structuredvalue_numericclaim_structuredvalue_unittokensclaim_structuredvalue_squiggleresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
Example 4 — Uncertain Estimate with Squiggle Distribution
claim_text: "Anthropic had approximately 1,200–1,500 employees as of Q1 2025"
claimssubject_entity_idanthropicclaimsclaim_varietystructuredclaimsvalid_startclaimstemporal_granularityquarterclaim_structuredproperty_idemployee-countclaim_structuredvalue_numericclaim_structuredvalue_squiggle"900 to 1800"(log-normal 90% CI)claim_structuredvalue_unitFTEresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=⚠ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT (with note)
Why Squiggle instead of point estimate: The source explicitly says "around 1,200 to 1,500" — the uncertainty is in the source, not in our reading of it. Storing 1350 as a point estimate would fabricate precision.
value_squiggle = "900 to 1800"represents a wider 90% CI: even the source's range is itself an estimate.Example 5 — Order-of-Magnitude Estimate (AI Compute)
claim_text: "GPT-4's training required approximately 2×10²⁴ floating point operations (FLOP)"
claimssubject_entity_idgpt-4claimsclaim_varietystructuredclaimsvalid_startclaimstemporal_granularitydayclaim_structuredproperty_idtraining-computeclaim_structuredvalue_numericclaim_structuredvalue_squiggle"lognormal({p5: 5e23, p95: 1e25})"claim_structuredvalue_unitFLOPresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
Example 6 — Community Forecast as Structured Claim
claim_text: "Metaculus community forecast for P(transformative AI by 2030): 18% as of January 2025"
claimssubject_entity_idagi-arrival(concept entity)claimsclaim_varietystructuredclaimsvalid_startclaimstemporal_granularitydayclaim_structuredproperty_idp-agi-by-yearclaim_structuredqualifier_key2030claim_structuredvalue_numericclaim_structuredvalue_squiggle"0.12 to 0.26"(Metaculus 80% CI on the community estimate)claim_structuredvalue_unitprobabilityresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
qualifier_key = "2030"distinguishes "P(AGI by 2030)" from "P(AGI by 2035)" — both valid distinct claimsExample 7 — Attributed Statement (Opinion/Prediction)
claim_text: "Dario Amodei stated in October 2024 that AI systems could make Nobel Prize-level scientific discoveries within 1–2 years"
claimssubject_entity_iddario-amodeiclaimsclaim_varietyattributedclaimsattributed_todario-amodeiclaimsclaim_textclaimsvalid_startclaimstemporal_granularitymonthresourcesurlclaim_sourcessource_quoteclaim_sourcesis_primaryQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
claim_structuredrow — attributed statements don't have structured valuesExample 8 — Leadership Role (with temporal boundary on prior)
Two claims together: the current fact and the historical record.
claim_text A (active): "Sam Altman is CEO of OpenAI (reinstated November 22, 2023)"
claimssubject_entity_idsam-altmanclaimsclaim_varietystructuredclaimsvalid_startclaimsvalid_endclaim_structuredproperty_idleadership-roleclaim_structuredqualifier_keyopenai-ceoclaim_structuredvalue_entity_idopenai(the organization they lead)claim_text B (archived): "Emmett Shear was interim CEO of OpenAI from November 18–22, 2023"
valid_startvalid_endstatusarchivedarchive_reasonsupersededqualifier_keyopenai-ceoQuality (for A): P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
qualifier_key = "openai-ceo"is what makes A and B uniquely identified — without it, the claims identity(sam-altman, leadership-role, 2023-11-22)would collide with other leadership roles at other organizationsarchive_reason = supersededdocuments that it ended, not that it was wrongExample 9 — Research Finding (Paper Result, Asserted)
claim_text: "InstructGPT (1.3B parameters) was preferred over GPT-3 (175B parameters) by human raters on 85% of prompts, per Ouyang et al. 2022"
claimssubject_entity_idrlhf(concept entity — claim is about RLHF as a technique)claimsclaim_varietystructuredclaimsattributed_toclaimsvalid_startclaim_structuredproperty_idresearch-finding-pctclaim_structuredqualifier_keyinstructgpt-vs-gpt3-human-preferenceclaim_structuredvalue_numericclaim_structuredvalue_unitpercentresourcesurlclaim_sourcessource_quoteQuality: P=✓ R=✓ U=✓ I=✓ C=✓ Cl=✓ → ACCEPT
attributed_to = nullmeans we assert this is an accurate reading of the paper. If instead we stored "InstructGPT demonstrates that RLHF dramatically improves capability" — a gloss, not the number — that would requireattributed_to = ouyang-et-albecause we wouldn't be vouching for it ourselves.Example 10 — Time Series as Inline Series
Instead of 5 separate claims, Anthropic's full funding history fits as one series claim:
claim_text: "Anthropic's funding rounds from 2021 to 2024"
claimssubject_entity_idanthropicclaimsclaim_varietystructuredclaim_structuredproperty_idfunding-round-amountclaim_structuredvalue_series[ {"qualifier_key": "Seed", "date": "2021-05-01", "value_numeric": 124000000, "value_squiggle": null}, {"qualifier_key": "Series-B", "date": "2022-04-01", "value_numeric": 580000000, "value_squiggle": null}, {"qualifier_key": "Series-C", "date": "2023-05-01", "value_numeric": 7500000000, "value_squiggle": null}, {"qualifier_key": "Series-D", "date": "2023-08-01", "value_numeric": 1250000000, "value_squiggle": null}, {"qualifier_key": "Series-E", "date": "2024-03-01", "value_numeric": 7300000000, "value_squiggle": null} ]When to use series vs. separate claims: Use separate claims when rounds will be individually cited, updated, or verified at different times. Use
value_serieswhen the whole history is a single artifact from a single source (e.g., Crunchbase profile). For Anthropic, separate claims are better — different rounds have different primary sources and staleness dates.Part 2: Transformations — Bad Input → Good Claim
These show what the ingest pipeline does when proposed claims fail quality checks.
Transform 1 — Vague Quantity
Input: "OpenAI will make around $40M in early 2025"
Pipeline action:
revenue-quarterly, value_numeric=850000000, value_squiggle="650000000 to 1100000000", qualifier_key=Q1-2025Key principle: don't store vague quantities in
value_numeric. Either decompose to a specific (possibly uncertain) value or route to attributed statement.Transform 2 — Comparative Claim
Input: "OpenAI has a much higher valuation than Anthropic"
Pipeline action: Reject as a structured claim. Create or verify two independent structured claims:
market-valuation, value_numeric=157000000000market-valuation, value_numeric=61500000000The wiki page prose then says: "As of early 2025, OpenAI's valuation ($157B) is roughly 2.5× Anthropic's ($61.5B)." The comparison is derived at display time from two precise facts, not stored.
Transform 3 — Attributed Prediction Masquerading as Fact
Input: "AI will reach human-level performance on most cognitive tasks by 2030"
Pipeline action:
p-hlai-by-year, qualifier_key=2030, value_squiggle="0.20 to 0.55"(community CI)Transform 4 — Overly Specific Metadata
Input: "Anthropic's San Francisco office is at 760 Market Street"
Pipeline action: Reject. Store
headquarters-location = "San Francisco, CA"at city-level precision instead, if not already stored. The street address is below the importance threshold for the properties vocabulary.Part 3: Rejections — What Should Not Be Stored
Rejection 1 — Normative Claim
Input: "Anthropic is the most safety-focused major AI lab"
Why rejected: Opinion/evaluation. No property maps to "relative safety-focus." Cannot be verified from sources — any source that says this is expressing a position, not a fact. If Anthropic or a credible analyst says this: store as attributed statement. If wiki editors believe it: express in prose with a footnote to evidence, not as a structured claim.
Rejection 2 — Inference Chain
Input: "Anthropic's $7.3B Series E makes it likely to remain a leading AI lab through 2026"
Why rejected: Causal inference from a fact to a prediction. Even if true, this involves a chain of reasoning: funding → operational runway → capacity to compete → "leading." Each link is contestable. The underlying fact ("raised $7.3B") should be stored (and is — Example 1). The inference stays in wiki prose.
Rejection 3 — Aggregate Characterization
Input: "The AI industry is experiencing rapid growth"
Why rejected: Too broad, no entity, no property, no value. If there's a specific measure: e.g., "Global AI investment was $91.9B in 2022 (Stanford HAI)" → that IS storable. The characterization is derivable from specific facts; the characterization itself isn't.
Rejection 4 — Duplicate
Input: "Anthropic's employee count is around 1,500 in early 2025" (when Example 4 already exists)
Why rejected: Near-duplicate. Identity
(anthropic, employee-count, Q1-2025)already exists. The ingest pipeline's uniqueness check should catch this before attempting insertion. The only action is to verify whether the new source (if different) should be added as an additionalclaim_sourcesentry to the existing claim.Cross-Cutting Observations
1. Squiggle is appropriate for ~20% of numeric structured claims
Of the numeric examples above: funding rounds, context windows, research findings, and founding dates have no meaningful uncertainty →
value_squiggle = null. Employee counts, compute estimates, and probability forecasts have genuine uncertainty →value_squigglepopulated. The 80/20 rule applies: most structured facts are well-sourced point estimates; the distribution is for the genuinely uncertain minority.2.
qualifier_keysolves identity for repeated propertiesWithout
qualifier_key, identity for(anthropic, funding-round-amount)is ambiguous — there are five distinct funding rounds.qualifier_key = "Series-E"disambiguates. This is a pattern for any property that can occur multiple times: leadership roles at different orgs, model releases, funding rounds, policy positions.3. The attributed/asserted distinction drives verification routing
Examples 7 vs. 9 demonstrate the fork: Example 9 uses
attributed_to = null(wiki asserts 85% is the correct reading of the paper). Example 7 usesattributed_to = dario-amodei(wiki only asserts Dario said it). A future verification pipeline uses this field to route: asserted → fact-check; attributed → quote-check. This is the most important single field for verification correctness.4. High-importance claims are the only ones worth automating
The quality dimension "Importance" is the most subjective and the hardest for LLMs to assess. The properties vocabulary implicitly encodes importance: if a property exists in the vocabulary (
funding-round-amount,training-compute,employee-count), it's already been judged important enough to track. Claims using undefined properties are evidence that either the property should be added to the vocabulary, or the claim isn't important enough to store.5. Rejection is the default; acceptance is the exception
The transformation and rejection examples show that most candidate claims from extraction need work. The ingest pipeline's job is not to pass claims through — it's to either make them precise enough to store, route them to attributed statements, or reject them. A well-running pipeline might accept 30% of raw candidate claims as structured facts on first pass.
Beta Was this translation helpful? Give feedback.
All reactions