Skip to content

feat: add serialization helpers (to_dict/to_yaml) to BaseSearchIndex#542

Open
thakoreh wants to merge 3 commits intoredis:mainfrom
thakoreh:fix/serialization-helpers-v2
Open

feat: add serialization helpers (to_dict/to_yaml) to BaseSearchIndex#542
thakoreh wants to merge 3 commits intoredis:mainfrom
thakoreh:fix/serialization-helpers-v2

Conversation

@thakoreh
Copy link

@thakoreh thakoreh commented Mar 25, 2026

Summary

Replaces #525. Adds to_dict, to_yaml, and _sanitize_redis_url to BaseSearchIndex — available on both SearchIndex and AsyncSearchIndex without code duplication.

Changes

  • DRY: All serialization lives in BaseSearchIndex (not duplicated on both sync/async)
  • Password sanitization: Handles both redis://:pass@host and redis://user:pass@host
  • None-safe: _redis_url=None returns None instead of crashing
  • Round-trip: from_dict and from_yaml now restore _redis_url and _connection_kwargs
  • overwrite param: to_yaml(overwrite=False) raises FileExistsError
  • Secure: Sensitive kwargs (passwords, etc.) are filtered; only ssl_cert_reqs passes through
  • Top-level import: yaml imported at module level (not inline)

Testing

  • 16 unit tests covering to_dict, to_yaml, and round-trip (sync + async)
  • All pass locally

Review Feedback Addressed (from #525)

All bot and maintainer comments have been addressed.


Note

Medium Risk
Adds new serialization/round-trip paths for index configuration (including optional connection metadata), which could affect how indices are instantiated from YAML/dicts and risks accidental connection misconfiguration despite masking/filtering safeguards.

Overview
Adds BaseSearchIndex.to_dict()/to_yaml() to serialize index schema, optionally including sanitized connection metadata (masked Redis URL + allowlisted connection_kwargs).

Updates BaseSearchIndex.from_dict()/from_yaml() to restore _redis_url and _connection_kwargs when present, while intentionally skipping masked URLs (containing ****) to avoid silent auth failures. Includes new unit tests covering serialization, overwrite behavior, and sync/async round-trips.

Written by Cursor Bugbot for commit 6175835. This will update automatically on new commits. Configure here.

Addresses all review feedback on redis#525:

- Move to_dict/to_yaml/_sanitize_redis_url to BaseSearchIndex (DRY)
- Fix password-only URL sanitization (redis://:pass@host pattern)
- Handle _redis_url=None gracefully
- Fix from_dict to restore _redis_url and _connection_kwargs (round-trip)
- Fix from_yaml to also extract connection metadata via from_dict
- Add overwrite param to to_yaml (raises FileExistsError when False)
- Import yaml at module level (not inline)
- Filter sensitive connection_kwargs (only ssl_cert_reqs passes through)
- Add comprehensive test suite (16 tests, no Docker required)
  - to_dict: schema-only, connection, password sanitization, None URL,
    sensitive kwargs, async
  - to_yaml: basic write, overwrite guard, async
  - round-trip: dict, dict+connection, async+connection, yaml+connection,
    connection_kwargs
Copilot AI review requested due to automatic review settings March 25, 2026 17:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds shared serialization helpers to BaseSearchIndex so both SearchIndex and AsyncSearchIndex can export/import index configuration via dict/YAML, with optional (sanitized/filtered) connection metadata.

Changes:

  • Adds to_dict(), to_yaml(), and _sanitize_redis_url() to BaseSearchIndex.
  • Updates from_yaml() / from_dict() to round-trip _redis_url and _connection_kwargs (when present).
  • Adds unit tests covering dict/YAML serialization and round-trip behavior for sync + async indexes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
redisvl/index/index.py Implements serialization helpers on BaseSearchIndex and extends import paths to support round-tripping connection metadata.
tests/unit/test_index_serialization.py Adds unit tests for to_dict, to_yaml, overwrite behavior, and round-tripping (sync/async).
Comments suppressed due to low confidence (1)

redisvl/index/index.py:349

  • This docstring is on BaseSearchIndex.from_yaml, so the return type and examples should describe the generic cls (e.g., SearchIndex or AsyncSearchIndex) rather than always stating it returns SearchIndex. As written, it’s misleading for subclasses that call from_yaml.
        """Create a SearchIndex from a YAML schema file.

        If the YAML file contains ``_redis_url`` or ``_connection_kwargs``
        keys (produced by ``to_yaml(include_connection=True)``), they are
        automatically passed to the constructor.

        Args:
            schema_path (str): Path to the YAML schema file.

        Returns:
            SearchIndex: A RedisVL SearchIndex object.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return self.schema.index.storage_type

@classmethod
@classmethod
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two consecutive @classmethod decorators here. Applying @classmethod twice can lead to incorrect descriptor behavior (and may break attribute access). Remove the duplicate decorator so from_yaml is decorated exactly once.

Suggested change
@classmethod

Copilot uses AI. Check for mistakes.
Comment on lines +357 to +358
with open(schema_path) as f:
data = yaml.safe_load(f) or {}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_yaml now uses open(schema_path) directly, which drops the existing path validation/error normalization pattern used elsewhere (e.g., resolving with Path(...).resolve(), raising ValueError for invalid paths, and a consistent FileNotFoundError message). Consider resolving/validating the path first (and opening the resolved path) to keep public API behavior consistent.

Suggested change
with open(schema_path) as f:
data = yaml.safe_load(f) or {}
path_obj = Path(schema_path)
try:
resolved_path = path_obj.resolve()
except Exception as exc:
raise ValueError(f"Invalid schema path: {schema_path}") from exc
try:
with resolved_path.open() as f:
data = yaml.safe_load(f) or {}
except FileNotFoundError as exc:
raise FileNotFoundError(f"Schema file not found: {resolved_path}") from exc

Copilot uses AI. Check for mistakes.
Comment on lines +414 to +418
if parsed.password or parsed.username:
host_info = parsed.hostname or ""
if parsed.port:
host_info += f":{parsed.port}"
netloc = f"{parsed.username}:****@{host_info}" if parsed.username else f":****@{host_info}"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_redis_url masks the netloc when either parsed.password or parsed.username is present. This will incorrectly rewrite username-only URLs like redis://user@host:6379 into redis://user:****@host:6379, and to_dict(include_connection=True) will then round-trip an invalid/altered connection URL. Only mask when a password component is present (including empty-string passwords), while leaving username-only URLs unchanged.

Suggested change
if parsed.password or parsed.username:
host_info = parsed.hostname or ""
if parsed.port:
host_info += f":{parsed.port}"
netloc = f"{parsed.username}:****@{host_info}" if parsed.username else f":****@{host_info}"
# Only mask when a password component is present (including empty-string passwords),
# so that username-only URLs like ``redis://user@host:6379`` are left unchanged.
if parsed.password is not None:
host_info = parsed.hostname or ""
if parsed.port:
host_info += f":{parsed.port}"
user_part = f"{parsed.username}:" if parsed.username is not None else ":"
netloc = f"{user_part}****@{host_info}"

Copilot uses AI. Check for mistakes.
if fp.exists() and not overwrite:
raise FileExistsError(f"File {file_path} already exists.")
with open(fp, "w") as f:
yaml.dump(self.to_dict(include_connection=include_connection), f)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_yaml uses yaml.dump(...) without sort_keys=False, while other YAML writers in this repo explicitly set sort_keys=False for stable output ordering. Consider adding sort_keys=False here as well to keep serialized configs deterministic and consistent with IndexSchema.to_yaml/SemanticRouter.to_yaml.

Suggested change
yaml.dump(self.to_dict(include_connection=include_connection), f)
yaml.dump(self.to_dict(include_connection=include_connection), f, sort_keys=False)

Copilot uses AI. Check for mistakes.
- Remove duplicate @classmethod on from_yaml
- Add path validation/resolution in from_yaml
- Only mask URLs with password (not username-only URLs)
- Add sort_keys=False to yaml.dump for stable output
- Add test for username-only URL sanitization
Sanitized URLs (containing ****) are for inspection only — restoring
them as real redis_url would cause silent auth failures.

- from_dict/from_yaml now skip _redis_url when it contains ****
- Updated to_dict/to_yaml docstrings to clarify inspection-only behavior
- Added test_roundtrip_dict_sanitized_url_skipped (18/18 passing)
Copilot AI review requested due to automatic review settings March 25, 2026 17:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

redisvl/index/index.py:349

  • This from_yaml docstring says it creates a SearchIndex, but the method is defined on BaseSearchIndex and is inherited/used by AsyncSearchIndex as well. Updating the wording (and return type wording) to be class-generic (e.g., “Create an index from a YAML schema file” / “Returns: BaseSearchIndex”) will avoid misleading docs for async consumers.
        """Create a SearchIndex from a YAML schema file.

        If the YAML file contains ``_redis_url`` or ``_connection_kwargs``
        keys (produced by ``to_yaml(include_connection=True)``), they are
        automatically passed to the constructor.

        Args:
            schema_path (str): Path to the YAML schema file.

        Returns:
            SearchIndex: A RedisVL SearchIndex object.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +426 to +436
# empty-string passwords). Username-only URLs like
# ``redis://user@host:6379`` are left unchanged.
if parsed.password is not None:
host_info = parsed.hostname or ""
if parsed.port:
host_info += f":{parsed.port}"
user_part = f"{parsed.username}:" if parsed.username is not None else ":"
netloc = f"{user_part}****@{host_info}"
return parsed._replace(netloc=netloc).geturl()
return self._redis_url

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_redis_url() reconstructs the URL netloc using parsed.hostname/parsed.port, which can produce invalid or lossy URLs for valid Redis URLs that don’t fit the single-host pattern (e.g., IPv6 hosts that require brackets, or redis+sentinel:// URLs with comma-separated hosts). Consider sanitizing by operating on parsed.netloc (split on @, then mask only the password portion of the userinfo) so the host segment is preserved verbatim while still redacting credentials.

Suggested change
# empty-string passwords). Username-only URLs like
# ``redis://user@host:6379`` are left unchanged.
if parsed.password is not None:
host_info = parsed.hostname or ""
if parsed.port:
host_info += f":{parsed.port}"
user_part = f"{parsed.username}:" if parsed.username is not None else ":"
netloc = f"{user_part}****@{host_info}"
return parsed._replace(netloc=netloc).geturl()
return self._redis_url
# empty-string passwords). Username-only URLs like
# ``redis://user@host:6379`` are left unchanged.
if parsed.password is None:
return self._redis_url
netloc = parsed.netloc
# If there is no userinfo/host separator, leave unchanged.
if "@" not in netloc:
return self._redis_url
userinfo, host_part = netloc.rsplit("@", 1)
# userinfo may be "user:password", ":password", or just "password".
if ":" in userinfo:
username, _ = userinfo.split(":", 1)
if username:
masked_userinfo = f"{username}:****"
else:
# Password-only form ":password"
masked_userinfo = ":****"
else:
# No ":" present; treat entire userinfo as password.
masked_userinfo = "****"
masked_netloc = f"{masked_userinfo}@{host_part}"
return parsed._replace(netloc=masked_netloc).geturl()

Copilot uses AI. Check for mistakes.
)
d = idx.to_dict(include_connection=True)
# password should NOT leak
assert "s3cret" not in d["_connection_kwargs"]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion checks dict key membership, not whether the secret value leaked (i.e., it would still pass if {"password": "s3cret"} were present). Since this test is specifically about preventing credential leaks, consider asserting on the values (or on yaml.dump(d) / str(d)) so the intent is validated directly.

Suggested change
assert "s3cret" not in d["_connection_kwargs"]
assert "s3cret" not in yaml.dump(d["_connection_kwargs"])

Copilot uses AI. Check for mistakes.
path_obj = Path(schema_path)
try:
resolved_path = path_obj.resolve()
except Exception as exc:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_yaml catches a broad Exception around Path.resolve(), while other from_yaml helpers in this repo catch OSError specifically (e.g., IndexSchema.from_yaml and SemanticRouter.from_yaml). Narrowing this to except OSError would avoid masking unexpected errors and keep error-handling consistent.

Suggested change
except Exception as exc:
except OSError as exc:

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

if parsed.port:
host_info += f":{parsed.port}"
user_part = f"{parsed.username}:" if parsed.username is not None else ":"
netloc = f"{user_part}****@{host_info}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 addresses produce malformed URLs after sanitization

Low Severity

_sanitize_redis_url reconstructs the netloc using parsed.hostname, which strips square brackets from IPv6 addresses. For a URL like redis://:pass@[::1]:6379, parsed.hostname returns ::1 instead of [::1], so the reconstructed host_info becomes ::1:6379 — an ambiguous string where the port is indistinguishable from part of the IPv6 address. The resulting URL redis://:****@::1:6379 is malformed; it needs to be redis://:****@[::1]:6379.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants