Wikipedia Goggles

This repository hosts two Brave Search Goggles that reranks search results to promote sources considered reliable by the Wikipedia community:

wikipedia-reliable-sources-only.goggle – Boosts reliable sources and downranks contentious ones, showing no other results. Search using this Goggle.
wikipedia-reliable-sources.goggle – Similar to the first, but it allows additional sources while discarding those deemed unreliable. Search using this Goggle.

Reliability sources

This project leverages reliability ratings from various Wikipedia sources to assess the trustworthiness of content. The ratings are based on the following Wikipedia pages:

Additionally, sources frequently used in featured articles (FA) and good articles (GA) are included.

Project structure

core/ – Python modules with data processing logic
scripts/ – Standalone command-line utilities
tests/ – Pytest suite
docs/ – Documentation and roadmap
data/ – Raw and processed datasets
outputs/ – Generated analysis results

How reliability affects rankings

The reliability ratings are adjusted using the following parameters:

$boost=2 – Applied to sources considered "Generally reliable" or "Reliable"
$downrank=2 – Used for sources labeled with "No consensus"
$discard – Assigned to sources determined as "Unreliable," "Blacklisted," or "Deprecated"

Extracting perennial sources

Run scripts/fetch_perennial_sources.py to download and parse the perennial sources list from Wikipedia. The script cleans and validates the data then writes perennial_sources.json and perennial_sources.csv containing structured records.

python scripts/fetch_perennial_sources.py

JSON output

Running the script prints the number of parsed entries (for example, Fetched 485 sources) and writes perennial_sources.json. The file is a JSON array where each object has the following fields:

Field	Description
`source_name`	Name of the publication or website.
`reliability_status`	Two letter code from the `WP:RSPSTATUS` legend (e.g. `gr` = generally reliable, `gu` = generally unreliable, `nc` = no consensus, `d` = deprecated, `m` = marginal).
`notes`	Summary of discussions about the source.

Example entry:

[
  {
    "source_name": "ABC News (US)",
    "reliability_status": "gr",
    "notes": "There is consensus that ABC News, the news division of the American Broadcasting Company, is generally reliable. It is not to be confused with other publications of the same name."
  }
]

CSV output

The script also writes a perennial_sources.csv file with the same fields for easy spreadsheet analysis.

Checking for updates

Run scripts/update_checker.py periodically. It compares the current revision IDs of the perennial sources subpages against revision_ids.json. If any page has changed since the last run, the script re-fetches the tables and updates perennial_sources.json and perennial_sources.csv.

python scripts/update_checker.py

Extracting WikiProject sources

Run scripts/fetch_wikiproject_sources.py to download the reliability tables maintained by several WikiProjects. The command outputs wikiproject_sources.json and wikiproject_sources.csv at the repository root.

python scripts/fetch_wikiproject_sources.py

Citation normalization pipeline

The project now includes a modular workflow for gathering citation data and normalizing source URLs.

Fetch article lists
```
python -m core.fetch_articles
```
This writes good_articles.json and featured_articles.json under data/raw/.
Download wikitext for each article
```
python -m core.fetch_wikitext
```
Wikitext files are stored in data/raw/wikitext/.
Extract citation URLs
```
python -m core.extract_refs
```
The extracted references are written to data/processed/refs_extracted.json.
Normalize and rank sources
```
python -m core.clean_sources
```

The script applies domain aliases from data/alias_map.json, writes canonical counts to data/processed/sources_canonical.csv, and outputs the top sources to outputs/top_sources.csv.

Updating domain aliases

Add new mappings in data/alias_map.json to normalize additional domains. Each entry maps a short hostname to its canonical form. These aliases are loaded by core/clean_sources, so updates affect how sources are deduplicated.

Running tests

Install the dependencies listed in requirements.txt and execute the test suite with pytest:

pip install -r requirements.txt
pytest

Contributors should run the tests before committing changes to ensure nothing breaks.

Programmatic API usage

Import functions from the core package to integrate the pipeline into your own scripts. See docs/api_usage.md for complete examples.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.claude		.claude
.github		.github
.markdownlint-rules		.markdownlint-rules
.vscode		.vscode
.windsurf/rules		.windsurf/rules
core		core
data		data
docs		docs
outputs		outputs
scripts		scripts
tests		tests
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
wikipedia-reliable-sources-only.goggle		wikipedia-reliable-sources-only.goggle
wikipedia-reliable-sources.goggle		wikipedia-reliable-sources.goggle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Goggles

Reliability sources

Project structure

How reliability affects rankings

Extracting perennial sources

JSON output

CSV output

Checking for updates

Extracting WikiProject sources

Citation normalization pipeline

Updating domain aliases

Running tests

Programmatic API usage

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Goggles

Reliability sources

Project structure

How reliability affects rankings

Extracting perennial sources

JSON output

CSV output

Checking for updates

Extracting WikiProject sources

Citation normalization pipeline

Updating domain aliases

Running tests

Programmatic API usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages