Skip to content

Conversation

@divideby0
Copy link

Fix for #423

Changes

  • Replace periods with hyphens before calling generate_permalink() to prevent os.path.splitext() confusion
  • Convert Entity.file_path from @property to @computed_field so it's included in model_dump()
  • Add comprehensive test suite with 19 tests covering all kebab-case transformations

Testing

New test file: tests/mcp/test_tool_write_note_kebab_filenames.py

Coverage:

  • Basic transformations (spaces, underscores, CamelCase, periods)
  • Special characters (colons, parentheses, apostrophes)
  • Edge cases (consecutive chars, leading/trailing hyphens)
  • Backward compatibility (kebab_filenames=false)
  • Permalink consistency

Results:
✅ All 1241 tests pass (19 new, no regressions)
✅ Semgrep security scan: 0 findings

@CLAassistant
Copy link

CLAassistant commented Nov 12, 2025

CLA assistant check
All committers have signed the CLA.

@phernandez
Copy link
Member

Thanks @divideby0! I'll take a look at this

@phernandez phernandez requested a review from bdmayes November 12, 2025 03:43
@bdmayes
Copy link
Collaborator

bdmayes commented Nov 13, 2025

@divideby0 -- thanks for the contribution! Note the original changes made in this area were intended to handle slash characters (for example someone might name a note "blue/green deploy.md" or "the go/no go decision.md" -- in which case we didn't want that to create a nested folder path.

I'm curious why you don't actually want to preserve periods though? If I created a note named Version 1.3.0 then personally I would expect the filename of the title to be Version 1.3.0.md.

I suppose it seems to fine to replace them when kebab_filenames is true but I'm simply trying to understand further. Was there an actual bug here, or is this simply a change due to your preference in behavior?

@divideby0
Copy link
Author

thanks for the follow-up. this issue is specifically addressing kebab cased filenames. if i have a title like Fix for 2.0.0 Regression, the kebab-cased filename would become "fix-for-2.0.0 Regression.md". it adds a space between 2.0.0 and Regression because it was interpreting .0.0 Regression.md as the file extension. with this fix, the filename would become fix-for-2-0-0-regression.md, treating the . like any other punctuation in its kebab case logic. does that clarify the issue?

@bdmayes
Copy link
Collaborator

bdmayes commented Nov 14, 2025

@divideby0

I think the real problem might be that the content type of the file isn't being taken into account. If you write a note with the title "Fix for 2.0.0 Regression" then content_type will default to markdown (unless you specify otherwise). In that case, it should cause the extension .md to be added. But I see that happens in the parent function, and not inside of safe_title.

Calling generate_permalink was sort of a hack to get a sanitized title there in case folks put slashes in the name. My concern with your proposed change is that's destructive to period characters. Personally I think they should be preserved as much as possible as they are perfectly valid for a filename. In the case of a semantic versioning like that, they even have a...well...semantic meaning.

I am thinking that perhaps a better thing to do would be to write a function like this:

import mimetypes
from pathlib import Path

...

def has_valid_file_extension(filename: str) -> bool:
    mime_type, _ = mimetypes.guess_type(filename)
    return mime_type is not None

Then inside of safe_title we could do something like this:

if use_kebab_case:
    has_extension = has_valid_file_extension(fixed_title)
    fixed_title = generate_permalink(file_path=fixed_title, split_extension=has_extension)

I think that would solve your issue (though I haven't tested it), while preserving periods in the filename. Thoughts?

Fixes basicmachines-co#423

Implements maintainer feedback to preserve periods in version numbers
(e.g., "2.0.0") instead of converting them to hyphens. Uses
mimetypes.guess_type() to detect real file extensions and avoid
misinterpreting periods as extensions.

Changes:
- Add has_valid_file_extension() helper using mimetypes.guess_type()
- Modify generate_permalink() to conditionally call os.path.splitext()
  only when real file extensions are detected
- Update regex patterns to preserve periods in both CJK and non-CJK paths
- Update 19 test cases to expect period preservation

Examples:
- "Test 3.0 Version" → test-3.0-version.md (was: test-3-0-version.md)
- "Version 1.2.3 Release" → version-1.2.3-release.md

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Cedric Hurst <[email protected]>
@divideby0 divideby0 force-pushed the fix/423-kebab-periods branch from ac90c6e to e781f9c Compare November 16, 2025 00:51
@divideby0
Copy link
Author

Thanks for the feedback @bdmayes! I've updated the implementation to preserve periods in version numbers using your suggested approach with mimetypes.guess_type().

Changes Made

Core Implementation

  • Added has_valid_file_extension() helper function that uses mimetypes.guess_type() to detect real file extensions
  • Modified generate_permalink() to conditionally call os.path.splitext() only when a real extension is detected
  • Updated regex patterns in generate_permalink() to preserve periods (.) in both CJK and non-CJK paths

The Issue

The root problem was that os.path.splitext('Test 3.0 Version') returns:

  • base = 'Test 3'
  • extension = '.0 Version'

So os.path.splitext() was misinterpreting .0 Version as a file extension. The solution was to avoid calling os.path.splitext() on bare titles that don't have real file extensions.

Results

Now periods in version numbers are preserved:

  • "Test 3.0 Version"test-3.0-version.md ✅ (was: test-3-0-version.md)
  • "Version 1.2.3 Release"version-1.2.3-release.md ✅ (was: version-1-2-3-release.md)
  • "Feature (v2.0) Update"feature-v2.0-update.md ✅ (was: feature-v2-0-update.md)

Tests

All 19 tests passing. Updated test expectations to verify period preservation.

This approach maintains semantic meaning in version numbers while still converting other characters to kebab-case format.


Implementation assisted by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants