SEO 3.0: The Reasoning Web

Structured Linked Data for Agentic RAG — An Empirical Study

This repository contains the code, experiment infrastructure, and paper source for our study investigating how structured linked data (Schema.org markup and knowledge graph entity pages served by a Linked Data Platform) impacts retrieval accuracy in RAG systems built on Vertex AI Vector Search 2.0 and the Google Agent Development Kit (ADK).

Key Findings

Hypothesis	Result	Effect
H1: JSON-LD alone improves RAG accuracy	❌ Not significant (p=1.0)	Δ = +0.07
H2: Agentic RAG outperforms standard RAG	✅ Significant (p=0.001)	+14.4%, d=0.27
H3: Enhanced entity pages improve accuracy	✅ Significant (p<1e-11)	+29.5–30.8%, d=0.55–0.60

Why H1 fails: Our pipeline ingests pages as flat text truncated at 20k characters. 82% of documents exceed this limit, and the JSON-LD block sits right at the truncation boundary (median: char 18,510). Production search engines like Google extract JSON-LD separately — a fundamentally different architecture. See the paper for details.

The SEO 3.0 Framework

We propose three eras of search optimization:

SEO 1.0 — Document Ranking (1998–2011): Keywords and links
SEO 2.0 — Structured Data (2011–2023): Schema.org, knowledge panels
SEO 3.0 — The Reasoning Web (2023–present): AI systems that reason and act

And three tiers of AI visibility:

Citations — Is your content retrieved and attributed?
Reasoning — Can the AI reason correctly over your content?
Actions — Can the AI agent act on your content?

Domains Under Study

Domain	Vertical	Entity Types
BlackBriar	Advisors	Services, team members, insights
SalzburgerLand	Travel / Tourism	Places, attractions, cards
Express Legal Funding	Legal / Finance	Services, processes, state guides
WordLift Blog	Editorial	Articles, concepts (Knowledge Graph, SEO, NER)

Quick Start

# Install dependencies
pip install -e ".[dev]"

# Configure GCP credentials
gcloud auth application-default login

# Collect entities from the Linked Data Platform
python -m src.dataset.collector --config config/experiment_config.yaml

# Generate document variants (plain HTML, HTML+JSON-LD, enhanced)
python -m src.dataset.transformer --config config/experiment_config.yaml

# Generate test queries with ground truth
python -m src.dataset.query_generator --config config/experiment_config.yaml

# Set up Vertex AI Vector Search 2.0 collections
python -m src.indexing.vectorsearch --setup --ingest all

# Run experiments (all 6 conditions)
python -m src.evaluation.runner --config config/experiment_config.yaml

# Generate analysis, figures, and LaTeX tables
python -m src.evaluation.analysis --results-dir results/raw/ --output-dir results/

Project Structure

├── config/                 # Configuration files
├── data/                   # Dataset (raw, processed, queries) — see DATA_AVAILABILITY.md
├── src/                    # Source code
│   ├── dataset/            # Data collection & curation
│   ├── indexing/           # Vertex AI Vector Search 2.0
│   ├── retrieval/          # Standard & Agentic RAG pipelines
│   └── evaluation/         # Metrics, runner, analysis
├── templates/              # Enhanced entity page & llms.txt templates
├── paper/                  # LaTeX paper source (LNCS format)
├── scripts/                # Utility scripts
└── results/                # Experiment outputs (gitignored)

Requirements

Python 3.11+
Google Cloud project with Vertex AI APIs enabled
gcloud CLI authenticated

Data Availability

The experimental data (raw HTML, processed documents, evaluation results) is excluded from this repository to protect client confidentiality. See DATA_AVAILABILITY.md for details on how to request data for replication or reproduce the experiment with your own websites.

License

Code: MIT License
Paper & Figures: CC BY 4.0

Citation

If you use this work, please cite:

@inproceedings{volpini2026seo3,
  title     = {Structured Linked Data in Agentic {RAG}: From {SEO} to the Reasoning Web},
  author    = {Volpini, Andrea},
  booktitle = {Proceedings of the International Semantic Web Conference (ISWC)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
data		data
paper		paper
results		results
scripts		scripts
services/neural_search		services/neural_search
src		src
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DATA_AVAILABILITY.md		DATA_AVAILABILITY.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEO 3.0: The Reasoning Web

Key Findings

The SEO 3.0 Framework

Domains Under Study

Quick Start

Project Structure

Requirements

Data Availability

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SEO 3.0: The Reasoning Web

Key Findings

The SEO 3.0 Framework

Domains Under Study

Quick Start

Project Structure

Requirements

Data Availability

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages