Skip to content

Commit fb1350b

Browse files
fix: enhance character conflict detection and error handling for sync operations (#201)
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: Paul Hernandez <[email protected]>
1 parent 7585a29 commit fb1350b

File tree

5 files changed

+734
-4
lines changed

5 files changed

+734
-4
lines changed

docs/character-handling.md

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Character Handling and Conflict Resolution
2+
3+
Basic Memory handles various character encoding scenarios and file naming conventions to provide consistent permalink generation and conflict resolution. This document explains how the system works and how to resolve common character-related issues.
4+
5+
## Overview
6+
7+
Basic Memory uses a sophisticated system to generate permalinks from file paths while maintaining consistency across different operating systems and character encodings. The system normalizes file paths and generates unique permalinks to prevent conflicts.
8+
9+
## Character Normalization Rules
10+
11+
### 1. Permalink Generation
12+
13+
When Basic Memory processes a file path, it applies these normalization rules:
14+
15+
```
16+
Original: "Finance/My Investment Strategy.md"
17+
Permalink: "finance/my-investment-strategy"
18+
```
19+
20+
**Transformation process:**
21+
1. Remove file extension (`.md`)
22+
2. Convert to lowercase (case-insensitive)
23+
3. Replace spaces with hyphens
24+
4. Replace underscores with hyphens
25+
5. Handle international characters (transliteration for Latin, preservation for non-Latin)
26+
6. Convert camelCase to kebab-case
27+
28+
### 2. International Character Support
29+
30+
**Latin characters with diacritics** are transliterated:
31+
- `ø``o` (Søren → soren)
32+
- `ü``u` (Müller → muller)
33+
- `é``e` (Café → cafe)
34+
- `ñ``n` (Niño → nino)
35+
36+
**Non-Latin characters** are preserved:
37+
- Chinese: `中文/测试文档.md``中文/测试文档`
38+
- Japanese: `日本語/文書.md``日本語/文書`
39+
40+
## Common Conflict Scenarios
41+
42+
### 1. Hyphen vs Space Conflicts
43+
44+
**Problem:** Files with existing hyphens conflict with generated permalinks from spaces.
45+
46+
**Example:**
47+
```
48+
File 1: "basic memory bug.md" → permalink: "basic-memory-bug"
49+
File 2: "basic-memory-bug.md" → permalink: "basic-memory-bug" (CONFLICT!)
50+
```
51+
52+
**Resolution:** The system automatically resolves this by adding suffixes:
53+
```
54+
File 1: "basic memory bug.md" → permalink: "basic-memory-bug"
55+
File 2: "basic-memory-bug.md" → permalink: "basic-memory-bug-1"
56+
```
57+
58+
**Best Practice:** Choose consistent naming conventions within your project.
59+
60+
### 2. Case Sensitivity Conflicts
61+
62+
**Problem:** Different case variations that normalize to the same permalink.
63+
64+
**Example on macOS:**
65+
```
66+
Directory: Finance/investment.md
67+
Directory: finance/investment.md (different on filesystem, same permalink)
68+
```
69+
70+
**Resolution:** Basic Memory detects case conflicts and prevents them during sync operations with helpful error messages.
71+
72+
**Best Practice:** Use consistent casing for directory and file names.
73+
74+
### 3. Character Encoding Conflicts
75+
76+
**Problem:** Different Unicode normalizations of the same logical character.
77+
78+
**Example:**
79+
```
80+
File 1: "café.md" (é as single character)
81+
File 2: "café.md" (e + combining accent)
82+
```
83+
84+
**Resolution:** Basic Memory normalizes Unicode characters using NFD normalization to detect these conflicts.
85+
86+
### 4. Forward Slash Conflicts
87+
88+
**Problem:** Forward slashes in frontmatter or file names interpreted as path separators.
89+
90+
**Example:**
91+
```yaml
92+
---
93+
permalink: finance/investment/strategy
94+
---
95+
```
96+
97+
**Resolution:** Basic Memory validates frontmatter permalinks and warns about path separator conflicts.
98+
99+
## Error Messages and Troubleshooting
100+
101+
### "UNIQUE constraint failed: entity.file_path, entity.project_id"
102+
103+
**Cause:** Two entities trying to use the same file path within a project.
104+
105+
**Common scenarios:**
106+
1. File move operation where destination is already occupied
107+
2. Case sensitivity differences on macOS
108+
3. Character encoding conflicts
109+
4. Concurrent file operations
110+
111+
**Resolution steps:**
112+
1. Check for duplicate file names with different cases
113+
2. Look for files with similar names but different character encodings
114+
3. Rename conflicting files to have unique names
115+
4. Run sync again after resolving conflicts
116+
117+
### "File path conflict detected during move"
118+
119+
**Cause:** Enhanced conflict detection preventing potential database integrity violations.
120+
121+
**What this means:** The system detected that moving a file would create a conflict before attempting the database operation.
122+
123+
**Resolution:** Follow the specific guidance in the error message, which will indicate the type of conflict detected.
124+
125+
## Best Practices
126+
127+
### 1. File Naming Conventions
128+
129+
**Recommended patterns:**
130+
- Use consistent casing (prefer lowercase)
131+
- Use hyphens instead of spaces for multi-word files
132+
- Avoid special characters that could conflict with path separators
133+
- Be consistent with directory structure casing
134+
135+
**Examples:**
136+
```
137+
✅ Good:
138+
- finance/investment-strategy.md
139+
- projects/basic-memory-features.md
140+
- docs/api-reference.md
141+
142+
❌ Problematic:
143+
- Finance/Investment Strategy.md (mixed case, spaces)
144+
- finance/Investment Strategy.md (inconsistent case)
145+
- docs/API/Reference.md (mixed case directories)
146+
```
147+
148+
### 2. Permalink Management
149+
150+
**Custom permalinks in frontmatter:**
151+
```yaml
152+
---
153+
type: knowledge
154+
permalink: custom-permalink-name
155+
---
156+
```
157+
158+
**Guidelines:**
159+
- Use lowercase permalinks
160+
- Use hyphens for word separation
161+
- Avoid path separators unless creating sub-paths
162+
- Ensure uniqueness within your project
163+
164+
### 3. Directory Structure
165+
166+
**Consistent casing:**
167+
```
168+
✅ Good:
169+
finance/
170+
investment-strategies.md
171+
portfolio-management.md
172+
173+
❌ Problematic:
174+
Finance/ (capital F)
175+
investment-strategies.md
176+
finance/ (lowercase f)
177+
portfolio-management.md
178+
```
179+
180+
## Migration and Cleanup
181+
182+
### Identifying Conflicts
183+
184+
Use Basic Memory's built-in conflict detection:
185+
186+
```bash
187+
# Sync will report conflicts
188+
basic-memory sync
189+
190+
# Check sync status for warnings
191+
basic-memory status
192+
```
193+
194+
### Resolving Existing Conflicts
195+
196+
1. **Identify conflicting files** from sync error messages
197+
2. **Choose consistent naming convention** for your project
198+
3. **Rename files** to follow the convention
199+
4. **Re-run sync** to verify resolution
200+
201+
### Bulk Renaming Strategy
202+
203+
For projects with many conflicts:
204+
205+
1. **Backup your project** before making changes
206+
2. **Standardize on lowercase** file and directory names
207+
3. **Replace spaces with hyphens** in file names
208+
4. **Use consistent character encoding** (UTF-8)
209+
5. **Test sync after each batch** of changes
210+
211+
## System Enhancements
212+
213+
### Recent Improvements (v0.13+)
214+
215+
1. **Enhanced conflict detection** before database operations
216+
2. **Improved error messages** with specific resolution guidance
217+
3. **Character normalization utilities** for consistent handling
218+
4. **File swap detection** for complex move scenarios
219+
5. **Proactive conflict warnings** during permalink resolution
220+
221+
### Monitoring and Logging
222+
223+
The system now provides detailed logging for conflict resolution:
224+
225+
```
226+
DEBUG: Detected potential file path conflicts for 'Finance/Investment.md': ['finance/investment.md']
227+
WARNING: File path conflict detected during move: entity_id=123 trying to move from 'old.md' to 'new.md'
228+
```
229+
230+
These logs help identify and resolve conflicts before they cause sync failures.
231+
232+
## Support and Resources
233+
234+
If you encounter character-related conflicts not covered in this guide:
235+
236+
1. **Check the logs** for specific conflict details
237+
2. **Review error messages** for resolution guidance
238+
3. **Report issues** with examples of the conflicting files
239+
4. **Consider the file naming best practices** outlined above
240+
241+
The Basic Memory system is designed to handle most character conflicts automatically while providing clear guidance for manual resolution when needed.

src/basic_memory/services/entity_service.py

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from basic_memory.markdown.utils import entity_model_from_markdown, schema_to_markdown
1616
from basic_memory.models import Entity as EntityModel
1717
from basic_memory.models import Observation, Relation
18+
from basic_memory.models.knowledge import Entity
1819
from basic_memory.repository import ObservationRepository, RelationRepository
1920
from basic_memory.repository.entity_repository import EntityRepository
2021
from basic_memory.schemas import Entity as EntitySchema
@@ -44,6 +45,39 @@ def __init__(
4445
self.file_service = file_service
4546
self.link_resolver = link_resolver
4647

48+
async def detect_file_path_conflicts(self, file_path: str) -> List[Entity]:
49+
"""Detect potential file path conflicts for a given file path.
50+
51+
This checks for entities with similar file paths that might cause conflicts:
52+
- Case sensitivity differences (Finance/file.md vs finance/file.md)
53+
- Character encoding differences
54+
- Hyphen vs space differences
55+
- Unicode normalization differences
56+
57+
Args:
58+
file_path: The file path to check for conflicts
59+
60+
Returns:
61+
List of entities that might conflict with the given file path
62+
"""
63+
from basic_memory.utils import detect_potential_file_conflicts
64+
65+
conflicts = []
66+
67+
# Get all existing file paths
68+
all_entities = await self.repository.find_all()
69+
existing_paths = [entity.file_path for entity in all_entities]
70+
71+
# Use the enhanced conflict detection utility
72+
conflicting_paths = detect_potential_file_conflicts(file_path, existing_paths)
73+
74+
# Find the entities corresponding to conflicting paths
75+
for entity in all_entities:
76+
if entity.file_path in conflicting_paths:
77+
conflicts.append(entity)
78+
79+
return conflicts
80+
4781
async def resolve_permalink(
4882
self, file_path: Permalink | Path, markdown: Optional[EntityMarkdown] = None
4983
) -> str:
@@ -54,18 +88,30 @@ async def resolve_permalink(
5488
2. If markdown has permalink but it's used by another file -> make unique
5589
3. For existing files, keep current permalink from db
5690
4. Generate new unique permalink from file path
91+
92+
Enhanced to detect and handle character-related conflicts.
5793
"""
94+
file_path_str = str(file_path)
95+
96+
# Check for potential file path conflicts before resolving permalink
97+
conflicts = await self.detect_file_path_conflicts(file_path_str)
98+
if conflicts:
99+
logger.warning(
100+
f"Detected potential file path conflicts for '{file_path_str}': "
101+
f"{[entity.file_path for entity in conflicts]}"
102+
)
103+
58104
# If markdown has explicit permalink, try to validate it
59105
if markdown and markdown.frontmatter.permalink:
60106
desired_permalink = markdown.frontmatter.permalink
61107
existing = await self.repository.get_by_permalink(desired_permalink)
62108

63109
# If no conflict or it's our own file, use as is
64-
if not existing or existing.file_path == str(file_path):
110+
if not existing or existing.file_path == file_path_str:
65111
return desired_permalink
66112

67113
# For existing files, try to find current permalink
68-
existing = await self.repository.get_by_file_path(str(file_path))
114+
existing = await self.repository.get_by_file_path(file_path_str)
69115
if existing:
70116
return existing.permalink
71117

@@ -75,7 +121,7 @@ async def resolve_permalink(
75121
else:
76122
desired_permalink = generate_permalink(file_path)
77123

78-
# Make unique if needed
124+
# Make unique if needed - enhanced to handle character conflicts
79125
permalink = desired_permalink
80126
suffix = 1
81127
while await self.repository.get_by_permalink(permalink):

src/basic_memory/sync/sync_service.py

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,36 @@ async def handle_move(self, old_path, new_path):
453453

454454
entity = await self.entity_repository.get_by_file_path(old_path)
455455
if entity:
456+
# Check if destination path is already occupied by another entity
457+
existing_at_destination = await self.entity_repository.get_by_file_path(new_path)
458+
if existing_at_destination and existing_at_destination.id != entity.id:
459+
# Handle the conflict - this could be a file swap or replacement scenario
460+
logger.warning(
461+
f"File path conflict detected during move: "
462+
f"entity_id={entity.id} trying to move from '{old_path}' to '{new_path}', "
463+
f"but entity_id={existing_at_destination.id} already occupies '{new_path}'"
464+
)
465+
466+
# Check if this is a file swap (the destination entity is being moved to our old path)
467+
# This would indicate a simultaneous move operation
468+
old_path_after_swap = await self.entity_repository.get_by_file_path(old_path)
469+
if old_path_after_swap and old_path_after_swap.id == existing_at_destination.id:
470+
logger.info(f"Detected file swap between '{old_path}' and '{new_path}'")
471+
# This is a swap scenario - both moves should succeed
472+
# We'll allow this to proceed since the other file has moved out
473+
else:
474+
# This is a conflict where the destination is occupied
475+
raise ValueError(
476+
f"Cannot move entity from '{old_path}' to '{new_path}': "
477+
f"destination path is already occupied by another file. "
478+
f"This may be caused by: "
479+
f"1. Conflicting file names with different character encodings, "
480+
f"2. Case sensitivity differences (e.g., 'Finance/' vs 'finance/'), "
481+
f"3. Character conflicts between hyphens in filenames and generated permalinks, "
482+
f"4. Files with similar names containing special characters. "
483+
f"Try renaming one of the conflicting files to resolve this issue."
484+
)
485+
456486
# Update file_path in all cases
457487
updates = {"file_path": new_path}
458488

@@ -477,7 +507,26 @@ async def handle_move(self, old_path, new_path):
477507
f"new_checksum={new_checksum}"
478508
)
479509

480-
updated = await self.entity_repository.update(entity.id, updates)
510+
try:
511+
updated = await self.entity_repository.update(entity.id, updates)
512+
except Exception as e:
513+
# Catch any database integrity errors and provide helpful context
514+
if "UNIQUE constraint failed" in str(e):
515+
logger.error(
516+
f"Database constraint violation during move: "
517+
f"entity_id={entity.id}, old_path='{old_path}', new_path='{new_path}'"
518+
)
519+
raise ValueError(
520+
f"Cannot complete move from '{old_path}' to '{new_path}': "
521+
f"a database constraint was violated. This usually indicates "
522+
f"a file path or permalink conflict. Please check for: "
523+
f"1. Duplicate file names, "
524+
f"2. Case sensitivity issues (e.g., 'File.md' vs 'file.md'), "
525+
f"3. Character encoding conflicts in file names."
526+
) from e
527+
else:
528+
# Re-raise other exceptions as-is
529+
raise
481530

482531
if updated is None: # pragma: no cover
483532
logger.error(

0 commit comments

Comments
 (0)