Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.
Powered by the chatnoir-api package.
Install the package from PyPI:
pip install chatnoir-pyterrierYou can use the ChatNoirRetrieve PyTerrier module in any PyTerrier pipeline, like you would do with BatchRetrieve.
from chatnoir_pyterrier import ChatNoirRetrieve
chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1")
chatnoir.search("python library")ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices). These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:
from chatnoir_pyterrier import ChatNoirRetrieve, Feature
chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")
chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")We recommend wrapping ChatNoirRetrieve in a RetrieverCache, using the pyterrier-caching library:
from chatnoir_pyterrier import ChatNoirRetrieve
from pyterrier_caching import RetrieverCache
chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1")
cached_chatnoir = RetrieverCache("path/to/cache", chatnoir)This way, the ChatNoir API is called only once per query, and subsequent experiments can use the cached results. Refer to the pyterrier-caching documentation for more details on how the caching works.
Please check out our sample notebook or open it in Google Colab.
We also provide a hands-on guide for the TouchΓ© 2023 shared tasks here.
If you use this package, please cite the paper from the ChatNoir authors. You can use the following BibTeX information for citation:
@InProceedings{bevendorff:2018,
address = {Berlin Heidelberg New York},
author = {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
editor = {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
month = mar,
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Grenoble, France},
title = {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
year = 2018
}
@InProceedings{merker:2025a,
address = {Cham, Switzerland},
author = {Jan Heinrich Merker and Janek Bevendorff and Maik Fr{\"o}be and Tim Hagen and Harrisen Scells and Matti Wiegmann and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 47th European Conference on IR Research (ECIR 2025)},
doi = {10.1007/978-3-031-88720-8_17},
editor = {Claudia Hauff and Craig Macdonal and Dietmar Jannach and Gabriella Kazai and Franco Maria Nardini and Fabio Pinelli and Fabrizio Silvestri and Nicola Tonellotto},
month = apr,
pages = {96--104},
publisher = {Springer Nature},
series = {Lecture Notes in Computer Science},
site = {Lucca, Italy},
title = {{Web-scale Retrieval Experimentation with chatnoir-pyterrier}},
volume = 15576,
year = 2025
}With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections. We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.
First install the experiment dependencies:
pip install -e .[experiment]To run the experiments, first create the runs by running:
ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py This will create runs for each shared task in parallel and save it to a cache.
After creating the runs, the experiment.ipynb notebook can be used to analyze the results.
Head over to the ChatNoir ir_datasets indexer to learn more on how new ir_datasets-compatible datasets are indexed into ChatNoir.
To build this package and contribute to its development you need to install the build, and setuptools and wheel packages:
pip install build setuptools wheel(On most systems, these packages are already pre-installed.)
Install package and test dependencies:
pip install -e .[test]Configure the API keys for testing:
export CHATNOIR_API_KEY="<API_KEY>"Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit testsPlease also add tests for your newly developed code.
Wheels for this package can be built with:
python -m buildIf you hit any problems using this package, please file an issue. We're happy to help!
This repository is released under the MIT license.