Skip to content

Codex Relay: zyra-project/zyra PR #105 — feat(ftp): add overwrite mode and metadata-aware sync for acquire ftp#220

Merged
Hackshaven merged 21 commits intostagingfrom
relay/hh-pr-105
Feb 17, 2026
Merged

Codex Relay: zyra-project/zyra PR #105 — feat(ftp): add overwrite mode and metadata-aware sync for acquire ftp#220
Hackshaven merged 21 commits intostagingfrom
relay/hh-pr-105

Conversation

@Hackshaven
Copy link
Collaborator

Mirrored from zyra-project/zyra PR #105

Source: zyra-project/zyra#105

This PR is maintained by an automated relay. Changes should be made in the original downstream PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances Zyra’s FTP acquisition “sync directory” flow by adding configurable overwrite/replacement behavior and optional metadata-aware syncing, enabling better handling of backfilled/updated upstream FTP content.

Changes:

  • Add SyncOptions + decision logic (should_download) and wire it into ftp_backend.sync_directory() (including .done marker preservation and optional frames-meta support).
  • Expose new FTP sync flags on the CLI (zyra acquire ftp ... --sync-dir ...) and pass them through the connector client/API schema.
  • Add extensive unit + CLI-integration tests and update wizard capabilities + OpenAPI snapshot hash.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/zyra/connectors/backends/ftp.py Core implementation of sync options, metadata helpers, and updated sync loop using a shared FTP connection
src/zyra/connectors/ingest/__init__.py Adds CLI flags and constructs/passes SyncOptions into the FTP backend
src/zyra/connectors/clients.py Extends FTPConnector.sync_directory() to accept/pass sync_options
src/zyra/api/schemas/domain_args.py Expands AcquireFtpArgs to include new sync-related fields
tests/connectors/test_ftp_backend.py Adds tests for parsing, download decision logic, MDTM helper, sync behavior, and CLI option plumbing
src/zyra/wizard/zyra_capabilities.json Updates wizard capability metadata to include new FTP flags
src/zyra/wizard/zyra_capabilities/acquire.json Updates acquire-domain wizard capability metadata for new FTP flags
tests/snapshots/openapi_sha256.txt Updates OpenAPI snapshot hash due to schema/CLI surface changes
docs/plans/issue-11-ftp-sync-overwrite.md Adds an implementation plan document for the feature

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Add detailed implementation plan for adding overwrite mode and
metadata-aware sync to the acquire ftp command, including:
- Architecture decisions (SyncOptions dataclass, should_download logic)
- Implementation phases
- New CLI flags specification
- Test strategy
- Risk mitigations

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Implements issue #11: Add overwrite mode and metadata-aware sync for
filled-frame use cases in the acquire ftp --sync-dir command.

New CLI flags:
- --overwrite-existing: unconditional replacement
- --recheck-existing: compare file sizes when timestamps unavailable
- --min-remote-size: replace if remote larger (bytes or percentage)
- --prefer-remote: always prioritize remote versions
- --prefer-remote-if-meta-newer: use frames-meta.json timestamps
- --skip-if-local-done: respect .done marker files
- --recheck-missing-meta: re-download files lacking metadata
- --frames-meta: path to frames-meta.json

Implementation:
- Add SyncOptions dataclass to encapsulate sync behavior configuration
- Add get_remote_mtime() function using FTP MDTM command
- Add should_download() decision logic with precedence-based evaluation
- Add helper functions for size parsing and metadata handling
- Update sync_directory() to use new SyncOptions
- Update CLI argument registration and handler
- Update API schema (AcquireFtpArgs) for programmatic access
- Update FTPConnector.sync_directory() signature

Tests:
- Add comprehensive unit tests for all new functionality
- Test _parse_min_size with bytes and percentages
- Test should_download with all option combinations
- Test get_remote_mtime with MDTM parsing
- Test sync_directory with SyncOptions

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Format code with ruff
- Fix UP035: import Iterable from collections.abc
- Fix SIM102: combine nested if statements
- Remove unused import

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The new CLI arguments for FTP sync changed the OpenAPI schema,
requiring an updated snapshot hash.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Pre-commit hook regenerated the manifest to include the new FTP sync
options (--overwrite-existing, --recheck-existing, etc.)

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Fix docstring: clarify default SyncOptions behavior includes mtime-based
  replacement, not just missing/zero-byte files
- Add logging to _load_frames_meta() instead of silently swallowing exceptions
- Short-circuit remote metadata fetching when local-only conditions can decide
  (missing file, zero-byte, or .done marker present) to reduce FTP connections
- Add NOTE comment about potential future connection pooling optimization

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Address remaining Copilot review feedback:
- Add logging.debug() to OSError handler in prefer_remote_if_meta_newer check
- Add logging.debug() to OSError handler in MDTM comparison check

These handlers catch expected failures (file inaccessible) and now log
at debug level instead of silently passing.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The pre-commit manifest regeneration removed narrate.json and other
commands because the development environment lacks optional dependencies
(like OpenAI). This broke tests in the upstream repo.

This commit:
- Restores all manifest files from before the automated regeneration
- Manually adds the new FTP sync options to acquire.json
- Skips the manifest regeneration hook to avoid the same issue

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Address Copilot review feedback about connection overhead. Previously,
each get_size/get_remote_mtime call opened a new FTP connection, and
downloads used yet another connection - up to 3 connections per file.

Now sync_directory uses a single FTP connection for:
- SIZE queries (file size checks)
- MDTM queries (modification time checks)
- RETR commands (file downloads)

The connection is established lazily on first use and properly closed
in a finally block. This significantly reduces latency and server load
for large directory syncs.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Add debug logging to FTP SIZE/MDTM silent exception handlers
- Fix precedence mismatch: skip_if_local_done now checked before
  missing/zero-byte check in sync_directory, matching should_download()
- Add CLI integration tests for SyncOptions argument parsing

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Fix test comment wording: "10% larger" -> "10% increase"
- Fix grammar in --min-remote-size help text: add missing "is"
- Use round() instead of int() in _parse_min_size() for accuracy
- Rename _has_companion_meta() to _is_missing_companion_meta() for
  clearer semantics (inverted return value)
- Simplify get_mtime_via_conn() by removing duplicate return None

Note: Skipped the needs_remote_mtime optimization suggestion as it
would change behavior when prefer_remote_if_meta_newer doesn't trigger
a download but MDTM comparison would.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The "remove locals not on server" cleanup step was deleting .done
marker files because they don't exist on the remote. This broke the
skip_if_local_done behavior across multiple sync runs - after the
first sync, markers would be deleted and subsequent syncs would
re-download files that should stay skipped.

Now files ending in .done are excluded from cleanup, preserving
the marker-based skip behavior.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Add missing sync_dir to FTP acquire args_schema.optional list
- Update docs to reflect renamed function: _has_companion_meta() ->
  _is_missing_companion_meta()

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Properly regenerated using `zyra generate-manifest` with all
dependencies installed. This syncs the manifest with the actual
CLI argument definitions.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The argparse '%%' escape was leaking into generated manifests as
literal '10%%' instead of '10%'. Reworded help text to avoid the
percent character entirely:

"Threshold for replacement (absolute bytes or relative percentage)"

This keeps the documentation accurate without relying on argparse
escape sequences that don't translate to manifest generation.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
Address timezone correctness issues flagged in code review:

- Parse FTP MDTM timestamps as UTC-aware (RFC 3659 defines MDTM as UTC)
  in both get_mtime_via_conn() and get_remote_mtime()
- Use datetime.fromtimestamp(st_mtime, tz=timezone.utc) for local file
  mtime so both sides of comparisons are UTC-aware
- Normalize _get_meta_timestamp() to always return UTC-aware datetimes,
  handling both aware and naive ISO timestamps from frames-meta.json
- Gate recheck_existing size comparison on remote_mtime is None so it
  only fires as a fallback when MDTM is unavailable, preventing
  downloads of older files that happen to differ in size

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
- Catch ValueError/TypeError in MDTM timestamp parsing (get_mtime_via_conn
  and get_remote_mtime) to handle malformed server responses gracefully
- Validate _load_frames_meta() returns a dict before using it
- Fix FTP connection leak in list_files() by adding try/finally with
  ftp.quit()

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The static analysis flagged that `filename` (derived from
parse_ftp_path return tuple) could theoretically carry sensitive
data. While `filename` is the path component (not credentials),
the log message only needs the raw timestamp string to diagnose
parse failures. Simplified the debug log accordingly.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
When a user provides --frames-meta with a path that doesn't exist,
log a warning instead of silently falling back to non-metadata sync.
This helps catch typos and misconfiguration.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
When prefer_remote_if_meta_newer is set, should_download() can fall
through from the metadata check (step 6) to the default MDTM-based
comparison (step 10). Include this option in the needs_remote_mtime
calculation so the relationship is explicit and defensive against
future refactoring.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
The previous commit added `or options.prefer_remote_if_meta_newer` to
needs_remote_mtime, but this was redundant: when prefer_remote_if_meta_newer
is set without overwrite_existing/prefer_remote, the expression already
evaluates to True. When overwrite_existing or prefer_remote IS set,
should_download() returns at steps 4-5 before step 10 (MDTM comparison)
is reached, so fetching mtime is unnecessary.

Revert to the original simple expression and add a comment explaining
why this correctly handles all modes including metadata-aware sync.

https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw
Signed-off-by: Claude <noreply@anthropic.com>
@Hackshaven Hackshaven merged commit 29bbcb3 into staging Feb 17, 2026
7 checks passed
@Hackshaven Hackshaven deleted the relay/hh-pr-105 branch February 17, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants