From the Greek μοτίβα, meaning patterns, or the recognization of similar features between objects.
This is a scoped-down reimplementation of Yente and nomenklatura, used to match entities against sanctions lists.
Most of the algorithms are taken directly from those repositories, and simply reimplemented, and the credit should go to the Open Sanctions's team.
Note that this piece of software requires Yente to run beside it, including Elasticsearch and a valid, licensed, collection of dataset obtained from Open Sanctions.
Work in progress
Not all of Yente is going to be implemented here. Notably, none of the index updates feature are going to their way into this repository. We will focus on the request part (search and matching).
Even through we will strive to produce matching scores in the vicinity of those of Yente, exact scores are not a goal. In particular, the Rust implementations of some algorithms will produce slightly different results, resulting in different overall scores. This is, for example, the case of the algorithm transliterating scripts into latin, which do not use libicu by default, and might therefore produce slightly different results [1].
All implemented algorithms will feature an integration test comparing Motiva's score with Yente's and check they are within a reasonable epsilon of each other.
If at all possible, this project will try to use only Rust-native dependencies, and stay clear of integrating with C libraries through FFI [2].
Some liberty was taken to adapt some logic and algorithms from Yente, so do not expect fully-compliant API or behavior.
[1]: Motiva can be compiled with the icu feature to use the same transliteration library as yente. This will require libicu development headers and shared libraries.
[2]: With the default features configuration.
- POST /match/{dataset}
- GET /entities/{id}
- GET /algorithms
- GET /catalog
- name-based
- name-qualified
- logic-v1 [1]
- logic-v2
[1]: Features that are disabled by default were omited for now.
Before v0.5.0, motiva is only compatible with data indexer with Yente v4.x. Starting with v0.5.0, it will try to determine, at startup, which version of Yente was used to index the data (v4.x or v5.x), and adapt its queries to support it.
Motiva is configured via environment variables. The following variables are supported:
| Variable | Description | Default / Example |
|---|---|---|
ENV |
Environment (dev or production) |
dev |
LISTEN_ADDR |
Address to bind the API server | 0.0.0.0:8000 |
API_KEY |
Bearer token used to authenticate requests | (none) |
INDEX_URL |
Elasticsearch URL | http://localhost:9200 |
INDEX_AUTH_METHOD |
Elasticsearch authentication (none, basic, bearer, api_key, encoded_api_key) |
none |
INDEX_CLIENT_ID |
Elasticsearch client ID (required for basic or api_key) |
(none) |
INDEX_CLIENT_SECRET |
Elasticsearch client secret (required for basic, api_key or encoded_api_key) |
(none) |
INDEX_TLS_CA_CERT |
Path to a PEM-encoded certificate chain to use for TLS validation | (none) |
INDEX_TLS_SKIP_VERIFY |
If 1, do not validate the TLS certificate served by the Elasticsearch cluster |
0 |
INDEX_NAME |
Index prefix under which data was indexed (suffixed by -entities) |
yente |
MANIFEST_URL |
Optional URL to a custom manifest JSON file | (none) |
CATALOG_REFRESH_INTERVAL |
Interval at which to pull the manifest and catalogs | 1h |
MATCH_CANDIDATES |
Number of candidates to consider for matching | 10 |
ENABLE_PROMETHEUS |
Enable Prometheus metrics collection and /metrics endpoint | 0 |
ENABLE_TRACING |
Set to 1 to enable tracing |
(none) |
TRACING_EXPORTER |
Tracing exporter kind (otlp, or gcp if compiled with the gcp feature) |
otlp |
REQUEST_TIMEOUT |
Maximum duration for a match request | 10s |
SCOPED_INDEX_QUERY |
Query used to scope down the index used for match queries | see here |
Setting MANIFEST_FILE is required if you use a customized dataset list and would like your own manifest to be used for catalog generation. If omitted, the default manifest provided by Yente will be used. It requires either an HTTP URL or a local file path ending in .json, .yml or .yaml.
Some unbounded-in-size query parameters can be passed in the request body instead of through the URL query. This prevents, for some of them taking in unbounded lists, to overflow the maximum length of URLs. Namely, you can now pass the following parameters in the body:
include_datasetexclude_datasetexclude_entity_ids
The match endpoint body now takes a params object at its root:
{
"queries": [...],
"params": {
"include_datasets": [...],
"exclude_datasets": [...],
"exclude_entity_ids": [...]
}
}Motiva supports generating and using a trimmed down index for match queries, while keeping the full index for entity relation queries. This could allow improving performance of match queries if you are only interested in a subset of it, while keeping the full datasets for queries that are less time-sensitive.
For example, you could have a search index that only contains Person's that have sanction in their topics, while keeping the full index to retrieve details of an entity, enriched with all its relations. Depending on the query you use for the scoped index, you could see a great reduction in latency and resource consumption.
Motiva can be run with the create-scoped-index subcommand, which will take care of creating the scoped index and its aliases. Once it is done, restarting motiva will make it effective.
$ motiva create-scoped-index
2026-03-05T16:56:14.439865Z INFO libmotiva::index::elastic::scoped: found previous scoped index index="motiva-w4xgo6jh"
2026-03-05T16:56:14.546981Z INFO libmotiva::index::elastic::scoped: created new index, starting reindexing data index="motiva-9xtyeclx"
2026-03-05T16:56:24.030717Z INFO libmotiva::index::elastic::scoped: reindexed data index="motiva-9xtyeclx"
2026-03-05T16:56:24.041981Z INFO libmotiva::index::elastic::scoped: atomically swapped index from="motiva-w4xgo6jh" to="motiva-9xtyeclx"
2026-03-05T16:56:24.071765Z INFO libmotiva::index::elastic::scoped: deleted old index index="motiva-w4xgo6jh"The default scoped query is listed below, but can be customized through SCOPED_INDEX_QUERY.
{
"bool": {
"must": [
{ "terms": { "schema": [ "Person", "LegalEntity", "Organization", "Company", "Airplane", "Vessel" ] } },
{ "term": { "topics": "sanction" } }
]
}
}The scoped index is not kept automatically in sync with the full index, you would need to run motiva create-scoped-index again when you need to update it. We suggest running it after your regular indexing operations.
Once your scoped index is created, you can perform a /match request with the Motiva-specific ?index_type=scoped parameters for the new index to be used.
$ cargo run --release
$ echo '{"queries":{"test":{"schema":"Person","properties":{"name":["Vladimir Putin"]}}}}' | curl -XPOST 127.0.0.1:8080/match/sanctions -H content-type:application/json -d @-$ git clone --recurse-submodules git@github.com:apognu/motiva.git
$ cd motiva# Standard build
$ cargo build
# Build with libicu support (requires libicu-dev)
$ cargo build --release --features icu
# Build with GCP tracing support
$ cargo build --release --features gcpPre-built images are available in this repositor's packages section, at ghcr.io/apognu/motiva, for each combination of features. Alternatively, you can build the image thus:
# Build without libicu
$ docker build -t motiva .
# Build without standalone features
$ docker build --build-arg CARGO_ARGS="--features gcp" -t motiva:gcp .
# Build with libicu support
$ docker build --build-arg BASE=icu --build-arg CARGO_ARGS="--features icu" -t motiva:icu .To run the tests, a Python 3.13+ environment must be set up with the required dependencies (this include libicu). You can install it in a virtualenv by using the uv file at the root of this repository:
$ uv sync
$ cargo testOne quite lengthy test is ignored by default (scoring the cartesian product of 50x50 entities against each other) and compare it against nomenklatura. You can still run this test by running cargo test -- --include-ignored.
Motiva is a work in progress.
Contributions and feedback are welcome! Please familiarize yourself with the CONTRIBUTING.md guidelines beforehand.