Configuration File Reference

mmdbconvert uses a TOML configuration file to define how MMDB databases should be merged and exported. This document describes all available configuration options.

Quick Start

Here's a minimal example that reads from one database and outputs to CSV:

[output]
format = "csv"
file = "output.csv"

[[databases]]
name = "geo"
path = "/path/to/GeoIP2-City.mmdb"

[[columns]]
name = "country_code"
database = "geo"
path = ["country", "iso_code"]

Configuration Sections

General Settings

Top-level configuration options that affect overall behavior:

disable_cache = false  # Disable MMDB unmarshaler caching (default: false)

Performance Options:

disable_cache - Controls whether to disable MMDB unmarshaler caching. When false (default), uses cached unmarshalers for better performance. When true, disables the unmarshaler cache to reduce memory usage at the expense of performance (several times slower). For large databases with many columns, disabling cache can significantly reduce memory consumption but will make processing take several times longer. Can be overridden at runtime with the --disable-cache command-line flag.

Output Settings

The [output] section defines where and how data should be written.

[output]
format = "csv"    # Output format: "csv", "parquet", or "mmdb"
file = "output.csv"  # Output file path (use this for a combined file)
# ipv4_file = "output_ipv4.csv"  # Optional IPv4-only file (set both ipv4_file and ipv6_file, omit file)
# ipv6_file = "output_ipv6.csv"  # Optional IPv6-only file (set both ipv4_file and ipv6_file, omit file)
include_empty_rows = false  # Include rows with no MMDB data (default: false)

Data Filtering:

include_empty_rows - Controls whether rows with no MMDB data are written to the output. When false (default), rows where all data columns are empty/null are skipped. When true, all network ranges are included even if they have no associated data. Network columns (CIDR, start_ip, etc.) are always present and don't affect this filtering.

CSV Options

When format = "csv", you can specify CSV-specific options:

[output.csv]
delimiter = ","           # Field delimiter (default: ",")
include_header = true     # Include column headers (default: true)
ipv4_bucket_size = 16     # Bucket prefix length for IPv4 (default: 16)
ipv6_bucket_size = 16     # Bucket prefix length for IPv6 (default: 16)
ipv6_bucket_type = "string"  # IPv6 bucket value type: "string" or "int" (default: "string")

Option	Description	Default
`delimiter`	Field delimiter character	","
`include_header`	Include column headers in output	true
`ipv4_bucket_size`	Prefix length for IPv4 buckets (1-32, when `network_bucket` column used)	16
`ipv6_bucket_size`	Prefix length for IPv6 buckets (1-60, when `network_bucket` column used)	16
`ipv6_bucket_type`	IPv6 bucket value type: "string" (hex) or "int" (first 60 bits as integer)	"string"

Parquet Options

When format = "parquet", you can specify Parquet-specific options:

[output.parquet]
compression = "snappy"    # Compression: "none", "snappy", "gzip", "lz4", "zstd" (default: "snappy")
row_group_size = 500000   # Rows per row group (default: 500000)
ipv4_bucket_size = 16     # Bucket prefix length for IPv4 (default: 16)
ipv6_bucket_size = 16     # Bucket prefix length for IPv6 (default: 16)
ipv6_bucket_type = "string"  # IPv6 bucket value type: "string" or "int" (default: "string")

Option	Description	Default
`compression`	Compression codec: "none", "snappy", "gzip", "lz4", "zstd"	"snappy"
`row_group_size`	Number of rows per row group	500000
`ipv4_bucket_size`	Prefix length for IPv4 buckets (1-32, when `network_bucket` column used)	16
`ipv6_bucket_size`	Prefix length for IPv6 buckets (1-60, when `network_bucket` column used)	16
`ipv6_bucket_type`	IPv6 bucket value type: "string" (hex) or "int" (first 60 bits as integer)	"string"

MMDB Options

When format = "mmdb", you can specify MMDB-specific options:

[output.mmdb]
database_type = "GeoIP2-City"  # Database type (required)
description = { en = "Custom Database", de = "Benutzerdefinierte Datenbank" }  # Descriptions by language
languages = ["en", "de"]  # List of languages (auto-populated from description if omitted)
record_size = 28  # Record size: 24, 28, or 32 (default: 28)
include_reserved_networks = false  # Include reserved networks (default: false)

Notes:

database_type is required for MMDB output
languages is auto-populated from description keys if not specified
Split IPv4/IPv6 files are not supported for MMDB output (must use single file)
Network columns are not used for MMDB output (data is written by prefix)
Type hints are not allowed for MMDB output (types are preserved from source databases)

Splitting IPv4 and IPv6 Output

Set output.ipv4_file and output.ipv6_file to write IPv4 and IPv6 rows to separate files. When these fields are present, omit output.file. This works for both CSV and Parquet outputs:

[output]
format = "parquet"
ipv4_file = "merged_ipv4.parquet"
ipv6_file = "merged_ipv6.parquet"

When splitting output, both ipv4_file and ipv6_file must be configured.

IPv6 Bucket Type Options

IPv6 buckets can be stored as either hex strings (default) or int64 values:

String type (default):

Format: 32-character hex string (e.g., "20010db8000000000000000000000000")
Storage: 32 bytes per value

Int type (ipv6_bucket_type = "int"):

Format: First 60 bits of the bucket address as int64
Storage: 8 bytes per value (4x smaller than string)

We use 60 bits (not 64) because 60-bit values always fit in a positive int64, which simplifies queries by avoiding two's complement handling.

When to use each type:

Use string (default) for databases where hex string representations are simpler to work with.
Use int for reduced storage cost at the price of more complicated queries.

We do not provide a bytes type for the IPv6 bucket. Primarily this is because there so far has not been a need. For example, BigQuery cannot cluster on bytes, so it is not helpful there.

Network Columns

Network columns define how IP network information is output. These columns always appear first in the output, in the order defined.

[[network.columns]]
name = "network"    # Column name
type = "cidr"       # Output type

Available types:

Type	Description
`cidr`	CIDR notation (e.g., "203.0.113.0/24")
`start_ip`	Starting IP address (e.g., "203.0.113.0")
`end_ip`	Ending IP address (e.g., "203.0.113.255")
`start_int`	Starting IP as integer
`end_int`	Ending IP as integer
`network_bucket`	Bucket for efficient lookups. IPv4: integer. IPv6: hex string (default) or integer (with `ipv6_bucket_type = "int"`). Requires split files (CSV and Parquet only).

Default behavior: If no [[network.columns]] sections are defined:

CSV output: A single CIDR column named network is generated
Parquet output: Two integer columns start_int and end_int are generated for query-optimized IP lookups using predicate pushdown
MMDB output: No network columns (data is written by prefix)

You can override these defaults by explicitly defining your own [[network.columns]] sections.

Note: Integer network columns (start_int, end_int) only work with IPv4 when writing to a single Parquet file. To use these columns with IPv6 data, configure output.ipv4_file and output.ipv6_file so the rows are split by IP family, or switch to the string-based columns (start_ip, end_ip, cidr).

Example with multiple network columns:

[[network.columns]]
name = "network"
type = "cidr"

[[network.columns]]
name = "start_ip"
type = "start_ip"

[[network.columns]]
name = "end_ip"
type = "end_ip"

Databases

The [[databases]] section defines MMDB databases to read from. You can specify multiple databases.

[[databases]]
name = "enterprise"                              # Identifier used in column definitions
path = "/var/lib/GeoIP/GeoIP2-Enterprise.mmdb"  # Path to MMDB file

[[databases]]
name = "anonymous"
path = "/var/lib/GeoIP/GeoIP2-Anonymous-IP.mmdb"

The name field is used to reference the database in column definitions.

Data Columns

Data columns map fields from MMDB databases to output columns. These appear after network columns, in the order defined.

[[columns]]
name = "country_code"        # Output column name
database = "enterprise"      # Database to read from (must match a database name)
path = ["country", "iso_code"]   # Path segments to the field in source database
output_path = ["country", "iso_code"]  # Optional: path for MMDB output (defaults to [name])

Field descriptions:

name - Column name for CSV/Parquet output
database - Database to read from (must match a database name)
path - Path to field in source MMDB database
output_path - (Optional) Path for nested structure in MMDB output. If not specified, defaults to a flat structure using [name] as the path. Only relevant for MMDB output format.

Path Syntax

Paths are defined as TOML arrays. Each element represents one traversal step:

Strings access map keys (e.g., "country", "names")
Integers access array indices (supports negative indices)
Strings are used verbatim, so keys may include / without escaping
Empty array (path = []) means "copy entire record" - extracts all data from the MMDB record as a map

Examples:

# Simple field
path = ["country", "iso_code"]

# Nested object
path = ["country", "names", "en"]

# Array access
path = ["subdivisions", 0, "names", "en"]

# Deep nesting
path = ["location", "latitude"]

# Copy entire record
path = []

Copying Entire Records

Use path = [] to copy all data from an MMDB record. This is useful when merging entire databases:

[[columns]]
name = "all_enterprise_data"
database = "enterprise"
path = []  # Copy entire record from Enterprise database

For MMDB output, control where the data is placed using output_path:

output_path = [] - Merge all fields into root of output MMDB
output_path = ["some", "path"] - Place all fields nested at specified path
If output_path is not specified, defaults to [name] (single-level nesting)

Map merging behavior:

When multiple columns target the same path with maps, they are merged recursively:

Non-conflicting keys are combined
Nested maps are merged recursively
Conflicting keys (same key, different non-map values) cause an error

# Example: Merge Enterprise data at root + Anonymous IP data under traits
[[columns]]
name = "enterprise_all"
database = "enterprise"
path = []
output_path = []  # Merge into root

[[columns]]
name = "anonymous_all"
database = "anonymous"
path = []
output_path = ["traits"]  # Nest under traits

For CSV/Parquet output, the entire map is JSON-encoded as a string, just like other complex values.

Data Types

Scalar values are output based on type:
- Strings and numbers are output as-is
- Booleans are output as 1 (true) or 0 (false) in CSV format
Complex values (objects, arrays) are automatically JSON-encoded
Missing data results in an empty value (empty string for CSV, null for Parquet)

Example with complex type:

[[columns]]
name = "all_city_names"
database = "geo"
path = ["city", "names"]  # Outputs: {"en":"London","de":"Londres","es":"Londres"}

Complete Examples

Example 1: Client Use Case (GeoIP Enterprise + Anonymous IP)

[output]
format = "csv"
file = "enterprise-anonymous-merged.csv"

[[network.columns]]
name = "network"
type = "cidr"

[[databases]]
name = "enterprise"
path = "/var/lib/GeoIP/GeoIP2-Enterprise.mmdb"

[[databases]]
name = "anonymous"
path = "/var/lib/GeoIP/GeoIP2-Anonymous-IP.mmdb"

# GeoIP Enterprise fields
[[columns]]
name = "country_iso"
database = "enterprise"
path = ["country", "iso_code"]

[[columns]]
name = "country_name"
database = "enterprise"
path = ["country", "names", "en"]

[[columns]]
name = "subdivision_iso"
database = "enterprise"
path = ["subdivisions", 0, "iso_code"]

[[columns]]
name = "subdivision_name"
database = "enterprise"
path = ["subdivisions", 0, "names", "en"]

[[columns]]
name = "city_name"
database = "enterprise"
path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "enterprise"
path = ["location", "latitude"]

[[columns]]
name = "longitude"
database = "enterprise"
path = ["location", "longitude"]

[[columns]]
name = "accuracy_radius"
database = "enterprise"
path = ["location", "accuracy_radius"]

# GeoIP Anonymous IP fields
[[columns]]
name = "is_anonymous"
database = "anonymous"
path = ["is_anonymous"]

[[columns]]
name = "is_anonymous_vpn"
database = "anonymous"
path = ["is_anonymous_vpn"]

[[columns]]
name = "is_hosting_provider"
database = "anonymous"
path = ["is_hosting_provider"]

[[columns]]
name = "is_public_proxy"
database = "anonymous"
path = ["is_public_proxy"]

[[columns]]
name = "is_tor_exit_node"
database = "anonymous"
path = ["is_tor_exit_node"]

[[columns]]
name = "is_residential_proxy"
database = "anonymous"
path = ["is_residential_proxy"]

Example 2: Parquet with IP Ranges

[output]
format = "parquet"
file = "geo-data.parquet"

[output.parquet]
compression = "zstd"

[[network.columns]]
name = "network_cidr"
type = "cidr"

[[network.columns]]
name = "start_ip"
type = "start_ip"

[[network.columns]]
name = "end_ip"
type = "end_ip"

[[network.columns]]
name = "start_int"
type = "start_int"

[[network.columns]]
name = "end_int"
type = "end_int"

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]

[[columns]]
name = "city"
database = "city"
path = ["city", "names", "en"]

Example 3: Single Database with Complex Fields

[output]
format = "csv"
file = "geo-full.csv"

[[databases]]
name = "enterprise"
path = "GeoIP2-Enterprise.mmdb"

[[columns]]
name = "country_code"
database = "enterprise"
path = ["country", "iso_code"]

# This will output all localized names as JSON
[[columns]]
name = "country_names_json"
database = "enterprise"
path = ["country", "names"]

# Extract specific locales
[[columns]]
name = "country_name_en"
database = "enterprise"
path = ["country", "names", "en"]

[[columns]]
name = "country_name_de"
database = "enterprise"
path = ["country", "names", "de"]

Example 4: MMDB Output with Flat Structure

[output]
format = "mmdb"
file = "merged.mmdb"

[output.mmdb]
database_type = "GeoIP2-City"
description = { en = "Merged GeoIP Database" }

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

# Flat structure: each column becomes a top-level field
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]

[[columns]]
name = "longitude"
database = "city"
path = ["location", "longitude"]

Example 5: MMDB Output with Nested Structure

[output]
format = "mmdb"
file = "nested.mmdb"

[output.mmdb]
database_type = "GeoIP2-City"
description = { en = "Nested Structure Example", de = "Beispiel für verschachtelte Strukturen" }
record_size = 28
include_reserved_networks = false

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

# Nested structure: use output_path to create hierarchical data
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]
output_path = ["country", "iso_code"]  # Creates nested {"country": {"iso_code": "US"}}

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]
output_path = ["city", "names", "en"]  # Creates nested {"city": {"names": {"en": "New York"}}}

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
output_path = ["location", "latitude"]  # Creates nested {"location": {"latitude": 40.7128}}

[[columns]]
name = "longitude"
database = "city"
path = ["location", "longitude"]
output_path = ["location", "longitude"]  # Creates nested {"location": {"longitude": -74.0060}}

Notes on MMDB output:

Types are preserved from source databases (uint16, float32, etc.)
output_path determines the structure in the output MMDB
Without output_path, fields use a flat structure with name as the key
Multiple columns can share parent paths to build nested structures

Example 6: Copying Entire Databases with path = []

[output]
format = "mmdb"
file = "enterprise-with-anonymous.mmdb"

[output.mmdb]
database_type = "GeoIP2-Enterprise"
description = { en = "Enterprise + Anonymous IP Merged" }
record_size = 28

[[databases]]
name = "enterprise"
path = "GeoIP2-Enterprise.mmdb"

[[databases]]
name = "anonymous"
path = "GeoIP2-Anonymous-IP.mmdb"

# Copy all Enterprise fields to root of output MMDB
[[columns]]
name = "enterprise_all"
database = "enterprise"
path = []           # Copy entire record
output_path = []    # Merge into root

# Copy all Anonymous IP fields nested under traits
[[columns]]
name = "anonymous_all"
database = "anonymous"
path = []           # Copy entire record
output_path = ["traits"]  # Place under traits map

This configuration creates a merged MMDB where:

All Enterprise database fields appear at the root level (country, city, location, etc.)
All Anonymous IP fields are nested under traits (e.g., traits.is_anonymous, traits.is_anonymous_vpn)
If field names conflict at the same level, the tool exits with a clear error message
Nested maps are merged recursively, so multiple columns can contribute to the same parent map

Resulting structure:

{
  "country": {"iso_code": "US", "names": {...}},
  "city": {"names": {...}},
  "location": {"latitude": 37.751, "longitude": -97.822},
  "traits": {
    "is_anonymous": true,
    "is_anonymous_vpn": false,
    "is_hosting_provider": false
  }
}

Network Merging Behavior

When multiple databases contain overlapping networks, mmdbconvert creates the smallest possible non-overlapping network blocks. For each output network:

The tool determines which input databases have data for that network
For each column, data is retrieved from the specified database only
If the specified database has no data for that network, the column value is empty/null

This means each column independently specifies its data source, giving you complete control over the output.

Error Handling

Missing database files: Tool exits with an error
Invalid paths: Empty/null value in output
Invalid TOML syntax: Tool exits with parse error
Duplicate column names: Tool exits with an error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration File Reference

Quick Start

Configuration Sections

General Settings

Output Settings

CSV Options

Parquet Options

MMDB Options

Splitting IPv4 and IPv6 Output

IPv6 Bucket Type Options

Network Columns

Databases

Data Columns

Path Syntax

Copying Entire Records

Data Types

Complete Examples

Example 1: Client Use Case (GeoIP Enterprise + Anonymous IP)

Example 2: Parquet with IP Ranges

Example 3: Single Database with Complex Fields

Example 4: MMDB Output with Flat Structure

Example 5: MMDB Output with Nested Structure

Example 6: Copying Entire Databases with path = []

Network Merging Behavior

Error Handling

FilesExpand file tree

config.md

Latest commit

History

config.md

File metadata and controls

Configuration File Reference

Quick Start

Configuration Sections

General Settings

Output Settings

CSV Options

Parquet Options

MMDB Options

Splitting IPv4 and IPv6 Output

IPv6 Bucket Type Options

Network Columns

Databases

Data Columns

Path Syntax

Copying Entire Records

Data Types

Complete Examples

Example 1: Client Use Case (GeoIP Enterprise + Anonymous IP)

Example 2: Parquet with IP Ranges

Example 3: Single Database with Complex Fields

Example 4: MMDB Output with Flat Structure

Example 5: MMDB Output with Nested Structure

Example 6: Copying Entire Databases with path = []

Network Merging Behavior

Error Handling