Skip to content

Latest commit

 

History

History
496 lines (348 loc) · 15.3 KB

File metadata and controls

496 lines (348 loc) · 15.3 KB

Zeek Matchy Plugin

License Zeek Matchy

A Zeek plugin for high-performance threat intelligence matching using Matchy databases. Includes MatchyIntel, a drop-in alternative to Zeek's Intel Framework that fixes its two biggest pain points: memory consumption on clusters and updating data at runtime.


Table of Contents

Why Replace the Intel Framework?

If you've run Zeek's Intel Framework at scale, you've hit these problems:

Memory

The Intel Framework loads every indicator into each worker's heap. On a 32-core cluster, that's 32 copies of your indicator set in memory. A million indicators can easily consume tens of gigabytes across workers.

Matchy databases are memory-mapped. The OS maps the .mxy file once and all workers share the same physical pages via the page cache. Zero heap allocation per worker. On that same 32-core cluster, you go from 32 copies to 1.

Updating Data at Runtime

Replacing the loaded indicator set in the Intel Framework at runtime has been a long-standing pain point. You either restart Zeek (causing a gap in monitoring) or deal with the complexity of incremental insert/remove operations and Broker synchronization.

With Matchy, you just replace the .mxy file on disk. Auto-reload detects the change and swaps in the new database atomically—lock-free, with ~1-2ns overhead per query. No restart, no gap, no coordination between workers. Build your database offline, scp it to your sensor, done.

Performance

Operation Throughput
IP queries 7M+/sec
Pattern queries (globs) 3M+/sec
Database load time <1ms
Auto-reload overhead ~1-2ns/query

Performance is deterministic—no GC pauses, no hash table resizing during operation.

Operational Simplicity

  • No Broker: Database files are self-contained. Copy them with scp, distribute with Ansible, serve from S3.
  • No zeekctl deploy: Just replace the file on disk. Auto-reload handles the rest.
  • Debug offline: matchy query threats.mxy 1.2.3.4 works from any command line—no need to inspect Zeek's internal state.
  • Build anywhere: Generate .mxy files from CSV, JSON, or MISP feeds in CI/CD. The same binary file works on Linux, macOS, and FreeBSD.

Installation

Requirements

  • Zeek 5.0+ (with development headers if not installed from source)
  • Rust/Cargo (install from rustup.rs)
  • CMake 3.15+
  • C++17 compiler

Via Zeek Package Manager (zkg)

zkg install https://github.com/matchylabs/zeek-matchy-plugin

This requires Rust/Cargo to be installed on the build machine. The package manager handles the rest.

From Source

git clone https://github.com/matchylabs/zeek-matchy-plugin.git
cd zeek-matchy-plugin
mkdir build && cd build
cmake ..
make

This automatically clones and builds Matchy from source. If you already have a local Matchy checkout, point CMake at it to skip the clone:

cmake -DMATCHY_SOURCE_DIR=/path/to/matchy ..

Or if Matchy is already installed system-wide:

cmake -DBUILD_MATCHY=OFF ..
# Or specify the install prefix:
cmake -DBUILD_MATCHY=OFF -DMATCHY_ROOT=/usr/local ..

Install (optional)

sudo make install

Verify

# If using ZEEK_PLUGIN_PATH (development)
export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy-plugin/build
zeek -N Matchy::DB

Expected:

Matchy::DB - Fast IP and pattern matching using Matchy databases (dynamic, version 0.3.0)

Quick Start

  1. Install the Matchy CLI (if you don't have it already):

    cargo install matchy
  2. Create a threat database from CSV:

    cat > threats.csv << 'EOF'
    entry,threat_level,category,description
    1.2.3.4,high,malware,Known C2 server
    10.0.0.0/8,low,internal,RFC1918 private network
    *.evil.com,critical,phishing,Phishing domain pattern
    malware.example.com,high,malware,Malware distribution site
    EOF
    
    matchy build threats.csv -o threats.mxy --format csv
  3. Use it in Zeek (add to your local.zeek or a site-specific script):

    @load Matchy/DB/intel
    
    redef MatchyIntel::db_path = "/opt/threat-intel/threats.mxy";
    
    event MatchyIntel::match(s: MatchyIntel::Seen, metadata: string) {
        print fmt("THREAT: %s (%s) -> %s", s$indicator, s$where, metadata);
    }

That's it. MatchyIntel automatically checks connection IPs, DNS queries, HTTP hosts/URLs, and SSL/TLS SNI against your database.

Deployment

Adding to Your Zeek Configuration

Add these lines to your local.zeek (or a site-specific script):

@load Matchy/DB/intel

redef MatchyIntel::db_path = "/opt/threat-intel/threats.mxy";

Then deploy as usual with zeekctl deploy.

Cluster Deployment

Matchy databases are memory-mapped, which means all Zeek workers on the same host share the same physical memory pages. You don't need to worry about per-worker memory — the OS handles sharing via the page cache.

Each host in your cluster needs a copy of the .mxy file at the same path. Options:

  • Shared filesystem (NFS, CIFS): Put the .mxy on a shared mount. All hosts read from the same file. Simplest option.
  • Local copies: Distribute with rsync, Ansible, Salt, etc. Better I/O performance since reads don't cross the network.
  • CI/CD pipeline: Build the database in CI, push to an artifact store or S3, pull from each sensor on a cron job.

Updating Threat Intel

With auto-reload enabled (the default), updating is a file replacement. Always write to a temporary file first, then mv it into place. This ensures workers never see a partially-written file — mv on the same filesystem is atomic.

# Build new database (on your build host or in CI)
matchy build updated-threats.csv -o /opt/threat-intel/threats.mxy.tmp --format csv

# Atomically replace the live file
mv /opt/threat-intel/threats.mxy.tmp /opt/threat-intel/threats.mxy

If distributing to remote sensors, copy to a temp path first:

scp threats.mxy sensor01:/opt/threat-intel/threats.mxy.tmp
ssh sensor01 'mv /opt/threat-intel/threats.mxy.tmp /opt/threat-intel/threats.mxy'

All workers detect the file change and reload automatically. No Zeek restart, no zeekctl deploy, no monitoring gap.

MatchyIntel Framework

MatchyIntel is designed to feel familiar if you've used the Intel Framework, but with a fundamentally different architecture.

What It Observes Automatically

When you @load Matchy/DB/intel, it immediately starts observing:

Protocol What Where Enum
Connections Originator and responder IPs Conn::IN_ORIG, Conn::IN_RESP
DNS Query strings DNS::IN_REQUEST
HTTP Host header, full URL HTTP::IN_HOST_HEADER, HTTP::IN_URL
SSL/TLS SNI, certificate CN SSL::IN_SERVER_NAME, X509::IN_CERT

Auto-Reload

By default, MatchyIntel watches the database file and reloads when it changes. This is the recommended mode for production.

# Enabled by default
redef MatchyIntel::auto_reload = T;

# To disable (for manual control):
redef MatchyIntel::auto_reload = F;

To update your threat intel, simply replace the .mxy file on disk. All workers pick up the change automatically.

Runtime Database Switching

You can also change the database path at runtime via Zeek's Config framework:

# Switch to a different database
Config::set_value("MatchyIntel::db_path", "/opt/threat-intel/updated.mxy");

# Unload the database (stop matching)
Config::set_value("MatchyIntel::db_path", "");

If the new path is invalid, the change is rejected and the current database stays loaded.

Manual Observation

Check arbitrary indicators programmatically:

# Check an IP
MatchyIntel::seen(MatchyIntel::Seen($host=1.2.3.4,
                                    $where=MatchyIntel::IN_ANYWHERE));

# Check a domain
MatchyIntel::seen(MatchyIntel::Seen($indicator="evil.example.com",
                                    $indicator_type=MatchyIntel::DOMAIN,
                                    $where=MatchyIntel::IN_ANYWHERE));

Hooks

# Filter matches before they fire
hook MatchyIntel::seen_policy(s: MatchyIntel::Seen, found: bool) {
    # Suppress matches for local IPs
    if (s?$host && Site::is_local_addr(s$host))
        break;
}

# Customize logging
hook MatchyIntel::extend_match(info: MatchyIntel::Info, s: MatchyIntel::Seen, metadata: string) {
    # Add custom fields, modify info record, etc.
}

Log Output

Matches are logged to matchy_intel.log:

Field Description
ts Timestamp
uid Connection UID (if applicable)
id Connection 4-tuple (if applicable)
seen.indicator What was matched
seen.indicator_type ADDR, DOMAIN, URL, etc.
seen.where Where it was observed
metadata JSON blob from your database (all your custom fields)

Low-Level API

For more control, use the BiF functions directly:

global threats_db: opaque of MatchyDB;

event zeek_init() {
    threats_db = Matchy::load_database("/path/to/threats.mxy");

    if (!Matchy::is_valid(threats_db)) {
        print "Failed to load database!";
        return;
    }
}

event new_connection(c: connection) {
    local result = Matchy::query_ip(threats_db, c$id$orig_h);

    if (result != "") {
        print fmt("Threat detected from %s: %s", c$id$orig_h, result);
    }
}

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count) {
    local result = Matchy::query_string(threats_db, query);

    if (result != "") {
        print fmt("Malicious domain queried: %s - %s", query, result);
    }
}

Parsing Match Results

Query results are JSON strings. Use Zeek's from_json() to parse them into typed records:

@load base/frameworks/notice

module ThreatIntel;

export {
    redef enum Notice::Type += {
        Threat_Detected
    };

    type ThreatData: record {
        category: string &optional;
        threat_level: string &optional;
        description: string &optional;
    };

    global threats_db: opaque of MatchyDB;
}

event zeek_init() {
    threats_db = Matchy::load_database("/opt/threat-intel/threats.mxy");
}

event new_connection(c: connection) {
    local result = Matchy::query_ip(threats_db, c$id$orig_h);

    if (result != "") {
        local parsed = from_json(result, ThreatData);

        if (parsed$valid) {
            local threat: ThreatData = parsed$v;
            NOTICE([$note=Threat_Detected,
                    $conn=c,
                    $msg=fmt("Threat: %s (%s)", threat$category, threat$threat_level),
                    $sub=fmt("IP: %s", c$id$orig_h)]);
        }
    }
}

API Reference

Matchy::load_database(filename: string): opaque of MatchyDB

Load a database and return an opaque handle. The database is memory-mapped (not copied into memory). Automatically closed when the handle goes out of scope.

Matchy::load_database_with_options(filename: string, auto_reload: bool): opaque of MatchyDB

Load a database with auto-reload support. When auto_reload is T, the database watches its source file and transparently reloads when changes are detected (~1-2ns overhead per query, lock-free).

Matchy::is_valid(db: opaque of MatchyDB): bool

Check if a database handle is valid and open.

Matchy::query_ip(db: opaque of MatchyDB, ip: addr): string

Query by IP address. Returns a JSON string with match metadata, or "" if no match. Supports both exact IPs and CIDR matching (longest prefix wins).

Matchy::query_string(db: opaque of MatchyDB, query: string): string

Query by string. Returns a JSON string with match metadata, or "" if no match. Supports exact string matching and glob patterns (*.evil.com).

Building Matchy Databases

Install the CLI:

cargo install matchy

From CSV

# First column must be named "entry" — it's the match key.
# All other columns become metadata fields in query results.
cat > threats.csv << 'EOF'
entry,threat_level,category,description
1.2.3.4,high,malware,Known C2 server
10.0.0.0/8,low,internal,RFC1918 private network
*.evil.com,critical,phishing,Phishing domain pattern
malware.example.com,high,malware,Malware distribution site
EOF

matchy build threats.csv -o threats.mxy --format csv

Matchy auto-detects entry types: IP addresses, CIDR ranges, glob patterns, and literal strings. You can include as many entries as you need — databases with hundreds of thousands of indicators build in about a second.

From JSON

matchy build threats.json -o threats.mxy

From MISP Threat Feeds

Matchy can import directly from MISP JSON exports, preserving all metadata (tags, threat levels, categories):

matchy build misp-feed/ -o threats.mxy

This handles MISP's directory structure automatically, including manifest.json and per-event files. All indicator types are supported: IPs, domains, URLs, hashes, email addresses, etc.

Combining Multiple Sources

You can pass multiple files of the same format to a single build:

matchy build feed1.csv feed2.csv -o combined.mxy --format csv

Inspect and Query

# Show database metadata and statistics
matchy inspect threats.mxy

# Query from the command line (useful for debugging)
matchy query threats.mxy 1.2.3.4
matchy query threats.mxy "foo.evil.com"

Testing

The plugin includes a comprehensive btest suite:

cd testing
btest

Tests cover:

  • Plugin loading
  • IP and string queries (exact, CIDR, glob)
  • load_database_with_options() with auto-reload on/off
  • MatchyIntel seen() function
  • MatchyIntel auto-reload mode
  • Runtime database switching via Config::set_value()

Troubleshooting

Plugin not found at runtime:

export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy-plugin/build
zeek -N Matchy::DB

Database fails to load with "Unsupported version" error: Your .mxy file was built with matchy 1.x. Rebuild it with matchy 2.x:

cargo install matchy  # updates to 2.x
matchy build threats.csv -o threats.mxy --format csv

Build options:

# Use a local Matchy source checkout
cmake -DMATCHY_SOURCE_DIR=/path/to/matchy ..

# Use an existing Matchy installation
cmake -DBUILD_MATCHY=OFF -DMATCHY_ROOT=/path/to/matchy ..

# Specify Zeek location manually
cmake -DCMAKE_MODULE_PATH=/path/to/zeek/cmake ..

License

Apache-2.0 License. See LICENSE.

See Also