Skip to content

MCP: EmbeddedResource attachments, export_attachment tool, DuckDB type fixes, MIME date fix#55

Merged
wesm merged 14 commits intomainfrom
wesm/mcp-decode-base64
Feb 4, 2026
Merged

MCP: EmbeddedResource attachments, export_attachment tool, DuckDB type fixes, MIME date fix#55
wesm merged 14 commits intomainfrom
wesm/mcp-decode-base64

Conversation

@wesm
Copy link
Copy Markdown
Owner

@wesm wesm commented Feb 4, 2026

Summary

MCP get_attachment improvement:

  • Return attachment content as a native MCP EmbeddedResource with BlobResourceContents instead of embedding base64 in a custom JSON text payload. MCP clients (e.g. Claude Desktop) handle embedded resources natively without manual JSON parsing and base64 decoding.
  • Use json.Marshal for metadata JSON instead of fmt.Sprintf with %q to ensure valid JSON for all filename byte sequences.
  • URL-encode filenames in embedded resource URIs to handle spaces and non-ASCII characters.

New export_attachment MCP tool:

  • Claude Desktop can only display image attachments (JPEG, PNG, GIF, WebP) inline — PDFs and documents are retrieved but can't be presented. The new export_attachment tool saves an attachment to the local filesystem (defaults to ~/Downloads) and returns the file path.
  • Filename sanitization strips dangerous characters; collision avoidance appends _1, _2, etc.
  • Atomic file creation with O_CREATE|O_EXCL to prevent TOCTOU races.
  • Handles EISDIR (directory occupying target name) by falling through to suffixed names.

DuckDB type mismatch fixes:

  • Cast all Parquet table columns to their expected types at the CTE level using DuckDB's SELECT * REPLACE (CAST(...)) syntax. This fixes two classes of type mismatch errors caused by Parquet schema inference from SQLite storing numeric columns as VARCHAR:
    1. Cannot mix values of VARCHAR and INTEGER_LITERAL in COALESCE operator — in SearchFast, ListMessages, Aggregate, and GetTotalStats queries.
    2. Cannot compare values of type BIGINT and VARCHAR in IN/ANY/ALL clause — in ListMessages when sender/recipient filters trigger the filtered_msgs CTE with JOIN conditions across differently-typed columns.
  • Use TRY_CAST for has_attachments and attachment size columns for resilient type conversion.

MIME date parsing fix:

  • Fix platform-dependent named timezone parsing. Go's time.Parse resolves named timezone abbreviations (EST, MST, PST) against the local system timezone, producing different results on different machines.
  • Use time.ParseInLocation(..., time.UTC) which forces named timezones to offset 0, making parsing deterministic. This builds on the hasNumericOffset/toUTC helpers from PR Bug mime date parse #50 which correctly separate numeric-offset and named-timezone conversion logic.

Housekeeping:

  • Track the post-commit hook in .githooks/ so it works across worktrees.
  • Use $HOME instead of hardcoded path for portability.

Test plan

  • MCP attachment tests pass (12 cases including empty MIME, unicode filenames, path traversal, oversized)
  • Export attachment tests pass (custom destination, file collision, directory collision, default ~/Downloads, edge filenames, error cases)
  • DuckDB regression test: VARCHAR integer columns with ListMessages (unfiltered + sender/recipient filters), SearchFast, SearchFastCount, Aggregate, GetTotalStats
  • MIME date parsing tests pass on all system timezones (contributor's TestHasNumericOffset, TestToUTC, and expanded TestParseDate cases all passing)
  • Full test suite passes (go test ./...)
  • Manual: verify export_attachment in Claude Desktop saves PDF to ~/Downloads

🤖 Generated with Claude Code

wesm and others added 7 commits February 4, 2026 06:46
…se64

Return attachment content as a native MCP EmbeddedResource with
BlobResourceContents instead of embedding base64 in a custom JSON
payload. MCP clients handle EmbeddedResource blobs natively, removing
the need for clients to parse custom JSON and manually decode base64.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The post-commit hook was gitignored and missing from the tracked
.githooks directory, so it wasn't firing in worktrees.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address code review findings:
- Use json.Marshal for metadata instead of fmt.Sprintf with %q, which
  can produce invalid JSON for non-UTF8 or control bytes in filenames.
- URL-encode the filename in the embedded resource URI to handle spaces
  and special characters.
- Add tests for empty MIME type defaulting to application/octet-stream.
- Add tests for filenames with spaces and unicode characters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit CASTs in COALESCE expressions to prevent "Cannot mix
values of VARCHAR and INTEGER_LITERAL" binder errors. Parquet column
types inferred from SQLite may not match the literal fallback values
(e.g., size_estimate stored as VARCHAR instead of BIGINT). Explicit
CASTs ensure type consistency regardless of Parquet schema inference.

Fixes SearchFast, ListMessages, Aggregate, and GetTotalStats queries.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…olumns

Creates Parquet files where conversation_id and size_estimate are stored
as VARCHAR (not BIGINT), reproducing the DuckDB binder error "Cannot mix
values of VARCHAR and INTEGER_LITERAL in COALESCE operator". Tests
ListMessages, SearchFast, SearchFastCount, Aggregate, and GetTotalStats.

Verified the test fails without the CAST fix and passes with it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm changed the title Use MCP EmbeddedResource for get_attachment MCP: use EmbeddedResource for attachments, fix DuckDB COALESCE type mismatch Feb 4, 2026
Cast all Parquet table columns to their expected types in parquetCTEs()
using DuckDB's REPLACE syntax, fixing type mismatches that caused
"Cannot compare values of type BIGINT and VARCHAR in IN/ANY/ALL clause"
in ListMessages with sender/recipient filters. This also makes the
per-column CASTs in COALESCE expressions redundant, so they are removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm changed the title MCP: use EmbeddedResource for attachments, fix DuckDB COALESCE type mismatch MCP: use EmbeddedResource for attachments, fix DuckDB type mismatches Feb 4, 2026
time.Parse with named timezone abbreviations (MST, EST, PST) resolves
them against the local system timezone, producing different results on
different machines. On a CST system, parsing "15:04:05 EST" would
resolve EST to -5h, convert to local time (14:04:05 CST), and then
t.Hour() would return 14 instead of the input's 15.

Fix by using time.ParseInLocation(..., time.UTC) which forces unknown
named timezones to offset 0, preserving the input time values as UTC.
Numeric offsets remain absolute and unaffected. This eliminates the
need for the hasNumericOffset/toUTC helper functions entirely.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm changed the title MCP: use EmbeddedResource for attachments, fix DuckDB type mismatches MCP: use EmbeddedResource for attachments, fix DuckDB type mismatches, fix MIME date parsing Feb 4, 2026
Claude Desktop can only display image attachments inline. For PDFs and
other file types, add an export_attachment tool that writes the file to
the local filesystem (defaults to ~/Downloads) and returns the path.

Handles filename sanitization and collision avoidance (_1, _2, etc).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm changed the title MCP: use EmbeddedResource for attachments, fix DuckDB type mismatches, fix MIME date parsing MCP: EmbeddedResource attachments, export_attachment tool, DuckDB type fixes, MIME date fix Feb 4, 2026
@hughdbrown
Copy link
Copy Markdown
Contributor

hughdbrown commented Feb 4, 2026

@wesm So you merged my fixes for date parsing in #50, but you are returning to it in b23936c. if the test plan item is:

MIME date parsing tests pass on all system timezones

then were there test cases that #50 failed on?

wesm and others added 2 commits February 4, 2026 07:53
- Use TRY_CAST for has_attachments (BOOLEAN) and att.size (BIGINT) in
  parquetCTEs to handle non-castable VARCHAR values gracefully instead
  of failing at runtime. Test fixture updated to store both as VARCHAR.

- Fix TOCTOU race in export_attachment: replace stat-then-write with
  O_CREATE|O_EXCL via createExclusive() to atomically create files
  without symlink race conditions.

- Add tests for filename sanitization edge cases (empty, dot, path
  traversal, special characters) and parseDate named timezone coverage
  (EST, CST explicitly tested for platform independence).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
O_EXCL returns EISDIR (not EEXIST) when a directory exists at the path.
The new pathConflict helper catches both cases so createExclusive falls
through to suffixed names instead of returning an error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@hughdbrown
Copy link
Copy Markdown
Contributor

Hmmm, I see. Golang has time.ParseInLocation already that is a solution to this problem and the code was just calling the wrong API function.

google

Key Differences from time.Parse

The main distinction from the standard time.Parse is how a lack of explicit timezone information in the input string is handled.

  • time.Parse interprets a time string without timezone information as UTC.
  • time.ParseInLocation interprets a time string without timezone information as being in the specific loc provided as the third argument.

When the input string does contain timezone information (like a numeric offset or abbreviation), ParseInLocation uses the provided loc to resolve and match that information, whereas time.Parse tries to match it against the system's Local location.

@wesm
Copy link
Copy Markdown
Owner Author

wesm commented Feb 4, 2026

I'm still actively working on this branch — there was a test failing after the previous PR was merged so bear with me

@wesm
Copy link
Copy Markdown
Owner Author

wesm commented Feb 4, 2026

I'm reverting b23936c and will get this cleaned up

wesm and others added 2 commits February 4, 2026 08:14
time.Parse resolves named timezone abbreviations (EST, PST, etc.)
against the local system timezone, producing different results on
different machines. Fix by using time.ParseInLocation(..., time.UTC)
which forces named timezones to offset 0, making parsing deterministic.

This builds on the hasNumericOffset/toUTC helpers from PR #50, which
correctly separate numeric-offset and named-timezone conversion logic.
ParseInLocation ensures toUTC receives consistent input regardless of
the host system's timezone.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm merged commit 1f41f9e into main Feb 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants