Optimize TigrisFS sync performance using find command

## Problem

Full sync for ~1.4k files on TigrisFS-mounted cloud storage takes **52 minutes**, making the system unusable for cloud deployments.

### Root Cause

The current `scan_directory()` method (line 1133 in `sync_service.py`) uses Python's recursive `aiofiles.os.scandir()`, which makes thousands of network round trips:
- For each directory: network call to list entries
- For each file: `entry.stat()` to get metadata
- With nested directories + 1.4k files = thousands of network operations

**Evidence from Logfire traces:**
- Tenant `0a20eb58-970f-ab05-ff49-25a9cdb2179c` with ~1.4k files
- Full scan took **31.4 seconds for just 379 files** (claude-projects)
- Extrapolated to 1.4k files = 52+ minutes
- Meanwhile, **incremental scans complete in 200-600ms** using `find -newermt`

## Solution

Replace `scan_directory()` to use server-side `find` command with `-printf` for all scans (both full and incremental).

### Unified Implementation

**Single code path using `find`:**

```python
async def scan_directory(
    self, 
    directory: Path,
    since_timestamp: Optional[float] = None
) -> AsyncIterator[Tuple[str, os.stat_result]]:
    """Scan directory using find command (optimized for network filesystems).
    
    Args:
        directory: Directory to scan
        since_timestamp: Optional - only return files modified after this timestamp
        
    Yields:
        Tuples of (absolute_file_path, stat_info)
    """
    # Build find command with printf to get path + mtime + size in one operation
    cmd = f'find "{directory}" -type f -printf "%p\\t%T@\\t%s\\n"'
    if since_timestamp:
        since_date = datetime.fromtimestamp(since_timestamp).strftime("%Y-%m-%d %H:%M:%S")
        cmd += f' -newermt "{since_date}"'
    
    # Execute find, parse results, apply .bmignore, yield (path, stat_info) tuples
```

**Key optimization:** Using `find -printf "%p\t%T@\t%s\n"` returns path, mtime, and size in **one network operation**, eliminating per-file stat() calls.

### Code Consolidation

**Remove these methods (no longer needed):**
- `_scan_directory_full()` (line 1116)
- `_scan_directory_modified_since()` (line 1058)  
- `_quick_count_files()` (line 1022)

**Update callers:**
- `scan()` method: Use `scan_directory(directory)` for full scans
- `scan()` method: Use `scan_directory(directory, since_timestamp=watermark)` for incremental
- File counting: Use direct `find "{directory}" -type f | wc -l` subprocess

## Expected Performance

- **Full sync:** 52 minutes → **~2-3 minutes** (same speed as current incremental scans)
- **Incremental sync:** No change (already fast at 200-600ms)
- **Single code path:** Easier to maintain, test, and debug

## Why `find` Over Alternatives (e.g., jwalk, fd-find)

On network filesystems like TigrisFS, **network latency is the bottleneck**, not traversal speed:

- **find with -printf:** 1 subprocess → kernel batches operations → ~1 network operation per directory level
- **Rust tools (jwalk):** Still makes 1.4k individual stat() calls over network = 1.4k × network_latency
- **find is ubiquitous:** Works everywhere, no additional dependencies

The `find` command leverages kernel-level optimizations for network filesystems, making it ideal for this use case.

## Implementation Checklist

- [ ] Rewrite `scan_directory()` to use `find` with optional `-newermt` filter
- [ ] Parse `find -printf` output to create `os.stat_result` objects
- [ ] Apply `.bmignore` pattern filtering to results
- [ ] Delete obsolete helper methods (`_scan_directory_full`, `_scan_directory_modified_since`, `_quick_count_files`)
- [ ] Update `scan()` method to use unified `scan_directory()`
- [ ] Update file count logic in `scan()` to use `find | wc -l`
- [ ] Add tests for new implementation
- [ ] Validate with tenant 0a20eb58's projects (~1.4k files)
- [ ] Verify .bmignore patterns work correctly

## Files Modified

- `src/basic_memory/sync/sync_service.py`

## References

- Current incremental scan already uses `find -newermt` successfully (line 1058)
- Performance proven: 200-600ms for incremental scans vs 31+ seconds for Python scandir
- Logfire traces: tenant `0a20eb58-970f-ab05-ff49-25a9cdb2179c`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize TigrisFS sync performance using find command #398

Problem

Root Cause

Solution

Unified Implementation

Code Consolidation

Expected Performance

Why `find` Over Alternatives (e.g., jwalk, fd-find)

Implementation Checklist

Files Modified

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimize TigrisFS sync performance using find command #398

Description

Problem

Root Cause

Solution

Unified Implementation

Code Consolidation

Expected Performance

Why find Over Alternatives (e.g., jwalk, fd-find)

Implementation Checklist

Files Modified

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Why `find` Over Alternatives (e.g., jwalk, fd-find)