feat: add serialization helpers (to_dict/to_yaml) to BaseSearchIndex#542
feat: add serialization helpers (to_dict/to_yaml) to BaseSearchIndex#542thakoreh wants to merge 3 commits intoredis:mainfrom
Conversation
Addresses all review feedback on redis#525: - Move to_dict/to_yaml/_sanitize_redis_url to BaseSearchIndex (DRY) - Fix password-only URL sanitization (redis://:pass@host pattern) - Handle _redis_url=None gracefully - Fix from_dict to restore _redis_url and _connection_kwargs (round-trip) - Fix from_yaml to also extract connection metadata via from_dict - Add overwrite param to to_yaml (raises FileExistsError when False) - Import yaml at module level (not inline) - Filter sensitive connection_kwargs (only ssl_cert_reqs passes through) - Add comprehensive test suite (16 tests, no Docker required) - to_dict: schema-only, connection, password sanitization, None URL, sensitive kwargs, async - to_yaml: basic write, overwrite guard, async - round-trip: dict, dict+connection, async+connection, yaml+connection, connection_kwargs
There was a problem hiding this comment.
Pull request overview
Adds shared serialization helpers to BaseSearchIndex so both SearchIndex and AsyncSearchIndex can export/import index configuration via dict/YAML, with optional (sanitized/filtered) connection metadata.
Changes:
- Adds
to_dict(),to_yaml(), and_sanitize_redis_url()toBaseSearchIndex. - Updates
from_yaml()/from_dict()to round-trip_redis_urland_connection_kwargs(when present). - Adds unit tests covering dict/YAML serialization and round-trip behavior for sync + async indexes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
redisvl/index/index.py |
Implements serialization helpers on BaseSearchIndex and extends import paths to support round-tripping connection metadata. |
tests/unit/test_index_serialization.py |
Adds unit tests for to_dict, to_yaml, overwrite behavior, and round-tripping (sync/async). |
Comments suppressed due to low confidence (1)
redisvl/index/index.py:349
- This docstring is on
BaseSearchIndex.from_yaml, so the return type and examples should describe the genericcls(e.g.,SearchIndexorAsyncSearchIndex) rather than always stating it returnsSearchIndex. As written, it’s misleading for subclasses that callfrom_yaml.
"""Create a SearchIndex from a YAML schema file.
If the YAML file contains ``_redis_url`` or ``_connection_kwargs``
keys (produced by ``to_yaml(include_connection=True)``), they are
automatically passed to the constructor.
Args:
schema_path (str): Path to the YAML schema file.
Returns:
SearchIndex: A RedisVL SearchIndex object.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return self.schema.index.storage_type | ||
|
|
||
| @classmethod | ||
| @classmethod |
There was a problem hiding this comment.
There are two consecutive @classmethod decorators here. Applying @classmethod twice can lead to incorrect descriptor behavior (and may break attribute access). Remove the duplicate decorator so from_yaml is decorated exactly once.
| @classmethod |
redisvl/index/index.py
Outdated
| with open(schema_path) as f: | ||
| data = yaml.safe_load(f) or {} |
There was a problem hiding this comment.
from_yaml now uses open(schema_path) directly, which drops the existing path validation/error normalization pattern used elsewhere (e.g., resolving with Path(...).resolve(), raising ValueError for invalid paths, and a consistent FileNotFoundError message). Consider resolving/validating the path first (and opening the resolved path) to keep public API behavior consistent.
| with open(schema_path) as f: | |
| data = yaml.safe_load(f) or {} | |
| path_obj = Path(schema_path) | |
| try: | |
| resolved_path = path_obj.resolve() | |
| except Exception as exc: | |
| raise ValueError(f"Invalid schema path: {schema_path}") from exc | |
| try: | |
| with resolved_path.open() as f: | |
| data = yaml.safe_load(f) or {} | |
| except FileNotFoundError as exc: | |
| raise FileNotFoundError(f"Schema file not found: {resolved_path}") from exc |
redisvl/index/index.py
Outdated
| if parsed.password or parsed.username: | ||
| host_info = parsed.hostname or "" | ||
| if parsed.port: | ||
| host_info += f":{parsed.port}" | ||
| netloc = f"{parsed.username}:****@{host_info}" if parsed.username else f":****@{host_info}" |
There was a problem hiding this comment.
_sanitize_redis_url masks the netloc when either parsed.password or parsed.username is present. This will incorrectly rewrite username-only URLs like redis://user@host:6379 into redis://user:****@host:6379, and to_dict(include_connection=True) will then round-trip an invalid/altered connection URL. Only mask when a password component is present (including empty-string passwords), while leaving username-only URLs unchanged.
| if parsed.password or parsed.username: | |
| host_info = parsed.hostname or "" | |
| if parsed.port: | |
| host_info += f":{parsed.port}" | |
| netloc = f"{parsed.username}:****@{host_info}" if parsed.username else f":****@{host_info}" | |
| # Only mask when a password component is present (including empty-string passwords), | |
| # so that username-only URLs like ``redis://user@host:6379`` are left unchanged. | |
| if parsed.password is not None: | |
| host_info = parsed.hostname or "" | |
| if parsed.port: | |
| host_info += f":{parsed.port}" | |
| user_part = f"{parsed.username}:" if parsed.username is not None else ":" | |
| netloc = f"{user_part}****@{host_info}" |
redisvl/index/index.py
Outdated
| if fp.exists() and not overwrite: | ||
| raise FileExistsError(f"File {file_path} already exists.") | ||
| with open(fp, "w") as f: | ||
| yaml.dump(self.to_dict(include_connection=include_connection), f) |
There was a problem hiding this comment.
to_yaml uses yaml.dump(...) without sort_keys=False, while other YAML writers in this repo explicitly set sort_keys=False for stable output ordering. Consider adding sort_keys=False here as well to keep serialized configs deterministic and consistent with IndexSchema.to_yaml/SemanticRouter.to_yaml.
| yaml.dump(self.to_dict(include_connection=include_connection), f) | |
| yaml.dump(self.to_dict(include_connection=include_connection), f, sort_keys=False) |
- Remove duplicate @classmethod on from_yaml - Add path validation/resolution in from_yaml - Only mask URLs with password (not username-only URLs) - Add sort_keys=False to yaml.dump for stable output - Add test for username-only URL sanitization
Sanitized URLs (containing ****) are for inspection only — restoring them as real redis_url would cause silent auth failures. - from_dict/from_yaml now skip _redis_url when it contains **** - Updated to_dict/to_yaml docstrings to clarify inspection-only behavior - Added test_roundtrip_dict_sanitized_url_skipped (18/18 passing)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
redisvl/index/index.py:349
- This
from_yamldocstring says it creates aSearchIndex, but the method is defined onBaseSearchIndexand is inherited/used byAsyncSearchIndexas well. Updating the wording (and return type wording) to be class-generic (e.g., “Create an index from a YAML schema file” / “Returns: BaseSearchIndex”) will avoid misleading docs for async consumers.
"""Create a SearchIndex from a YAML schema file.
If the YAML file contains ``_redis_url`` or ``_connection_kwargs``
keys (produced by ``to_yaml(include_connection=True)``), they are
automatically passed to the constructor.
Args:
schema_path (str): Path to the YAML schema file.
Returns:
SearchIndex: A RedisVL SearchIndex object.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # empty-string passwords). Username-only URLs like | ||
| # ``redis://user@host:6379`` are left unchanged. | ||
| if parsed.password is not None: | ||
| host_info = parsed.hostname or "" | ||
| if parsed.port: | ||
| host_info += f":{parsed.port}" | ||
| user_part = f"{parsed.username}:" if parsed.username is not None else ":" | ||
| netloc = f"{user_part}****@{host_info}" | ||
| return parsed._replace(netloc=netloc).geturl() | ||
| return self._redis_url | ||
|
|
There was a problem hiding this comment.
_sanitize_redis_url() reconstructs the URL netloc using parsed.hostname/parsed.port, which can produce invalid or lossy URLs for valid Redis URLs that don’t fit the single-host pattern (e.g., IPv6 hosts that require brackets, or redis+sentinel:// URLs with comma-separated hosts). Consider sanitizing by operating on parsed.netloc (split on @, then mask only the password portion of the userinfo) so the host segment is preserved verbatim while still redacting credentials.
| # empty-string passwords). Username-only URLs like | |
| # ``redis://user@host:6379`` are left unchanged. | |
| if parsed.password is not None: | |
| host_info = parsed.hostname or "" | |
| if parsed.port: | |
| host_info += f":{parsed.port}" | |
| user_part = f"{parsed.username}:" if parsed.username is not None else ":" | |
| netloc = f"{user_part}****@{host_info}" | |
| return parsed._replace(netloc=netloc).geturl() | |
| return self._redis_url | |
| # empty-string passwords). Username-only URLs like | |
| # ``redis://user@host:6379`` are left unchanged. | |
| if parsed.password is None: | |
| return self._redis_url | |
| netloc = parsed.netloc | |
| # If there is no userinfo/host separator, leave unchanged. | |
| if "@" not in netloc: | |
| return self._redis_url | |
| userinfo, host_part = netloc.rsplit("@", 1) | |
| # userinfo may be "user:password", ":password", or just "password". | |
| if ":" in userinfo: | |
| username, _ = userinfo.split(":", 1) | |
| if username: | |
| masked_userinfo = f"{username}:****" | |
| else: | |
| # Password-only form ":password" | |
| masked_userinfo = ":****" | |
| else: | |
| # No ":" present; treat entire userinfo as password. | |
| masked_userinfo = "****" | |
| masked_netloc = f"{masked_userinfo}@{host_part}" | |
| return parsed._replace(netloc=masked_netloc).geturl() |
| ) | ||
| d = idx.to_dict(include_connection=True) | ||
| # password should NOT leak | ||
| assert "s3cret" not in d["_connection_kwargs"] |
There was a problem hiding this comment.
This assertion checks dict key membership, not whether the secret value leaked (i.e., it would still pass if {"password": "s3cret"} were present). Since this test is specifically about preventing credential leaks, consider asserting on the values (or on yaml.dump(d) / str(d)) so the intent is validated directly.
| assert "s3cret" not in d["_connection_kwargs"] | |
| assert "s3cret" not in yaml.dump(d["_connection_kwargs"]) |
| path_obj = Path(schema_path) | ||
| try: | ||
| resolved_path = path_obj.resolve() | ||
| except Exception as exc: |
There was a problem hiding this comment.
from_yaml catches a broad Exception around Path.resolve(), while other from_yaml helpers in this repo catch OSError specifically (e.g., IndexSchema.from_yaml and SemanticRouter.from_yaml). Narrowing this to except OSError would avoid masking unexpected errors and keep error-handling consistent.
| except Exception as exc: | |
| except OSError as exc: |
| if parsed.port: | ||
| host_info += f":{parsed.port}" | ||
| user_part = f"{parsed.username}:" if parsed.username is not None else ":" | ||
| netloc = f"{user_part}****@{host_info}" |
There was a problem hiding this comment.
IPv6 addresses produce malformed URLs after sanitization
Low Severity
_sanitize_redis_url reconstructs the netloc using parsed.hostname, which strips square brackets from IPv6 addresses. For a URL like redis://:pass@[::1]:6379, parsed.hostname returns ::1 instead of [::1], so the reconstructed host_info becomes ::1:6379 — an ambiguous string where the port is indistinguishable from part of the IPv6 address. The resulting URL redis://:****@::1:6379 is malformed; it needs to be redis://:****@[::1]:6379.


Summary
Replaces #525. Adds
to_dict,to_yaml, and_sanitize_redis_urltoBaseSearchIndex— available on bothSearchIndexandAsyncSearchIndexwithout code duplication.Changes
BaseSearchIndex(not duplicated on both sync/async)redis://:pass@hostandredis://user:pass@host_redis_url=NonereturnsNoneinstead of crashingfrom_dictandfrom_yamlnow restore_redis_urland_connection_kwargsto_yaml(overwrite=False)raisesFileExistsErrorssl_cert_reqspasses throughyamlimported at module level (not inline)Testing
Review Feedback Addressed (from #525)
All bot and maintainer comments have been addressed.
Note
Medium Risk
Adds new serialization/round-trip paths for index configuration (including optional connection metadata), which could affect how indices are instantiated from YAML/dicts and risks accidental connection misconfiguration despite masking/filtering safeguards.
Overview
Adds
BaseSearchIndex.to_dict()/to_yaml()to serialize index schema, optionally including sanitized connection metadata (masked Redis URL + allowlistedconnection_kwargs).Updates
BaseSearchIndex.from_dict()/from_yaml()to restore_redis_urland_connection_kwargswhen present, while intentionally skipping masked URLs (containing****) to avoid silent auth failures. Includes new unit tests covering serialization, overwrite behavior, and sync/async round-trips.Written by Cursor Bugbot for commit 6175835. This will update automatically on new commits. Configure here.