Skip to content

Add test databases for empty map/array last in metadata#222

Merged
horgh merged 2 commits intomainfrom
greg/eng-4322
Mar 5, 2026
Merged

Add test databases for empty map/array last in metadata#222
horgh merged 2 commits intomainfrom
greg/eng-4322

Conversation

@oschwald
Copy link
Member

@oschwald oschwald commented Mar 3, 2026

Summary

  • Adds two new test databases for libmaxminddb: libmaxminddb-empty-map-last-in-metadata.mmdb and libmaxminddb-empty-array-last-in-metadata.mmdb
  • These reproduce the off-by-one bug in get_entry_data_list() where a 0-length map or array at the end of the metadata section is incorrectly rejected

Context

ENG-4322. Reported via vimt/MaxMind-DB-Writer-python#16.

🤖 Generated with Claude Code

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces specialized test databases to validate the handling of empty data structures within the metadata section of MaxMind DB files. The primary goal is to ensure that the libmaxminddb library correctly processes databases where an empty map or array is the last element in the metadata, thereby preventing a known off-by-one parsing error.

Highlights

  • New Test Databases Added: Two new test databases, libmaxminddb-empty-map-last-in-metadata.mmdb and libmaxminddb-empty-array-last-in-metadata.mmdb, have been introduced to libmaxminddb.
  • Off-by-One Bug Reproduction: These databases are specifically designed to reproduce an off-by-one bug in get_entry_data_list() where a zero-length map or array at the end of the metadata section was incorrectly rejected.
Changelog
  • pkg/writer/baddata.go
    • Added libmaxminddb-empty-map-last-in-metadata.mmdb to the list of bad databases.
    • Added libmaxminddb-empty-array-last-in-metadata.mmdb to the list of bad databases.
  • pkg/writer/rawmmdb.go
    • Implemented writeMetadataBlockEmptyMapLast to create metadata with an empty map as the last field.
    • Implemented writeMetadataBlockEmptyArrayLast to create metadata with an empty array as the last field.
    • Implemented buildEmptyMapLastInMetadataDB to construct a full MMDB with an empty map last in metadata.
    • Implemented buildEmptyArrayLastInMetadataDB to construct a full MMDB with an empty array last in metadata.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds two new test databases to reproduce a bug related to empty maps or arrays at the end of the metadata section. The changes are correct and achieve the goal. I've provided a couple of suggestions in pkg/writer/rawmmdb.go to refactor the newly added functions to reduce significant code duplication and improve maintainability. Otherwise, the changes look good.

oschwald and others added 2 commits March 3, 2026 10:40
These databases reproduce the off-by-one bug in libmaxminddb's
get_entry_data_list() validation. When a 0-length map or array is
the last field in metadata, offset_to_next equals data_section_size
exactly, which the >= check incorrectly rejects.

ENG-4322

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@oschwald
Copy link
Member Author

oschwald commented Mar 3, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds two new test databases to address an off-by-one bug in get_entry_data_list() related to 0-length maps or arrays at the end of the metadata section. The changes include modifications to mise.toml, pkg/writer/baddata.go, and pkg/writer/rawmmdb.go, along with the addition of mise.lock. The review focuses on the correctness and maintainability of the added code, particularly the new database generation logic and metadata handling.

Comment on lines +161 to +163
if !ok {
panic("unknown metadata key: " + key)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Instead of panicking, return an error to allow the caller to handle the situation more gracefully.

valueWriter, ok := valueWriters[key]
		if !ok {
			return 0, fmt.Errorf("unknown metadata key: %s", key)
		}
		pos += writeMetaKey(buf[pos:], key)
		bytesWritten := valueWriter(buf[pos:])
		pos += bytesWritten
		return pos, nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the scope of this project and that it is a programmer error, panicking seems fine.

Also, the other suggestions don't really seem appropriate for in-memory writers for test databases.

Comment on lines +32 to +33
{"libmaxminddb-empty-map-last-in-metadata.mmdb", buildEmptyMapLastInMetadataDB()},
{"libmaxminddb-empty-array-last-in-metadata.mmdb", buildEmptyArrayLastInMetadataDB()},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These new test databases are great for regression testing. Consider adding a brief comment above each entry explaining the specific bug or edge case it's designed to catch. This will improve maintainability and make it easier to understand the purpose of each test case.


pos += writeMetaKey(buf[pos:], "database_type")
pos += writeString(buf[pos:], "Test")
func buildSimpleDB(metadataWriter func([]byte, uint32, uint64) int) []byte {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider making the buffer size configurable or dynamically sized based on the expected data size to avoid potential buffer overflows or wasted memory. A fixed size of 1024 bytes might not be sufficient for all cases.


pos += writeMetaKey(buf[pos:], "database_type")
pos += writeString(buf[pos:], "Test")
func buildSimpleDB(metadataWriter func([]byte, uint32, uint64) int) []byte {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding error handling to the metadataWriter function. If the function fails to write the metadata, it should return an error to the caller. This will allow the caller to handle the error appropriately and prevent the program from crashing.

Comment on lines +177 to +197
func buildSimpleDB(metadataWriter func([]byte, uint32, uint64) int) []byte {
const nodeCount = 1
const recordValue = nodeCount + 16
const buildEpoch = 1_000_000_000

pos += writeMetaKey(buf[pos:], "description")
pos += writeMap(buf[pos:], 0)
buf := make([]byte, 1024)
pos := 0

pos += writeMetaKey(buf[pos:], "ip_version")
pos += writeUint16(buf[pos:], 4)
pos += writeSearchTree(buf[pos:], recordValue)

pos += writeMetaKey(buf[pos:], "languages")
pos += writeEmptyArray(buf[pos:])
// 16-byte null separator
pos += dataSeparatorSize

pos += writeMetaKey(buf[pos:], "node_count")
pos += writeUint32(buf[pos:], nodeCount)
// Data: a simple map with one string entry
pos += writeMap(buf[pos:], 1)
pos += writeString(buf[pos:], "ip")
pos += writeString(buf[pos:], "test")

pos += writeMetaKey(buf[pos:], "record_size")
pos += writeUint16(buf[pos:], 24)
pos += metadataWriter(buf[pos:], nodeCount, buildEpoch)

return pos
return buf[:pos]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding error handling to the buildSimpleDB function. If any of the write operations fail, the function should return an error to the caller. This will allow the caller to handle the error appropriately and prevent the program from crashing.


pos += writeMetaKey(buf[pos:], "database_type")
pos += writeString(buf[pos:], "Test")
func buildSimpleDB(metadataWriter func([]byte, uint32, uint64) int) []byte {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding input validation to the metadataWriter function. This will help to prevent errors and ensure that the function is called with valid arguments.

@horgh
Copy link
Contributor

horgh commented Mar 4, 2026

LGTM. Tests are failing though. Claude said this about the failure:

● There it is:

  tar: libmaxminddb-1.13.2/t/maxmind-db/bad-data/libmaxminddb/libmaxminddb-empty-array-last-in-metadata.mmdb: file name is too long (max 99); not dumped

  The file path is too long for the old POSIX tar format (max 99 characters). The full path is 101 characters:

  libmaxminddb-1.13.2/t/maxmind-db/bad-data/libmaxminddb/libmaxminddb-empty-array-last-in-metadata.mmdb

  The map variant just barely fits at 97 characters:

  libmaxminddb-1.13.2/t/maxmind-db/bad-data/libmaxminddb/libmaxminddb-empty-map-last-in-metadata.mmdb

  array (5 chars) vs map (3 chars) — those 2 extra characters push it over the 99-character limit.

  Fix: Configure automake to use a tar format that supports longer paths. In configure.ac, you can add AM_INIT_AUTOMAKE([tar-ustar]) (or tar-pax) to use a format that supports
  longer filenames. Alternatively, you could shorten the test database filenames.

● Searched for 1 pattern, read 1 file (ctrl+o to expand)

● To fix, add tar-ustar to the AM_INIT_AUTOMAKE options:

  AM_INIT_AUTOMAKE(foreign tar-ustar m4_esyscmd([...]))

  This switches make dist to use the ustar tar format which supports paths up to 256 characters. Alternatively, the test database filenames could be shortened (e.g.,
  libmaxminddb-empty-array-last-meta.mmdb), but fixing the tar format is the more robust solution since other long paths could hit this in the future.

@horgh horgh merged commit 819f226 into main Mar 5, 2026
12 checks passed
@horgh horgh deleted the greg/eng-4322 branch March 5, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants