Skip to content

fix: resolve critical relative path issues in packaged build#185

Merged
bashandbone merged 5 commits intomainfrom
claude/fix-relative-paths-01982YFFb4V8714SNxCMGiqq
Dec 2, 2025
Merged

fix: resolve critical relative path issues in packaged build#185
bashandbone merged 5 commits intomainfrom
claude/fix-relative-paths-01982YFFb4V8714SNxCMGiqq

Conversation

@bashandbone
Copy link
Contributor

@bashandbone bashandbone commented Dec 2, 2025

Problem: Relative paths like Path(file).parent.parent.parent.parent "data" broke when package was flattened during build.

Solution:

✅ Moved data/node_types/ into package → src/codeweaver/data/node_types/
✅ Updated code to use importlib.resources.files("codeweaver.data")
✅ Added pickle cache for pre-processed node-types (performance boost)
✅ Created comprehensive build infrastructure:
scripts/build/prepare-build.py - Master orchestrator
scripts/build/preprocess-node-types.py - Generate cache
scripts/build/generate-schema.py - Generate schema
✅ Added mise prepare-version task for automated releases
Impact:

🐛 Fixed: Package works correctly when installed via pip/uv
⚡ Performance: Runtime loads from pickle cache (much faster)
🛠️ Maintainable: Automated build process
🔮 Future-proof: Kept dynamic loading for user-defined grammars
Files Changed: 42 files (+298, -32)

Branch: claude/fix-relative-paths-01982YFFb4V8714SNxCMGiqq

Ready to merge after review! 🚀

Summary by Sourcery

Fix packaged builds that relied on repo-relative paths by moving node type data into the package, adding a precomputed cache, and introducing reproducible build-time generation of schemas and artifacts.

New Features:

  • Load node type definitions and a precomputed node types cache from the packaged codeweaver.data resources instead of repository-relative paths.
  • Provide a version preparation task that orchestrates build-time artifact generation, changelog updates, tagging, and optional pushing.
  • Add dedicated build scripts to prepare node types cache and generate settings JSON schema artifacts as part of the release process.

Bug Fixes:

  • Resolve failures in installed packages caused by hard-coded repository-relative paths to node_types and schema locations.
  • Make runtime version detection more robust by falling back gracefully when git metadata is unavailable in packaged environments.

Enhancements:

  • Speed up startup by loading preprocessed node type grammar data from a pickle cache when available.
  • Refine the settings schema API to focus on runtime schema bytes while delegating file generation to build scripts.

Build:

  • Adjust Hatch build configuration to package node type data via the codeweaver.data package and include schema and build scripts in source distributions.
  • Introduce a master prepare-build script that runs language, provider, node-types, cache preprocessing, and schema generation steps in a fixed order.

- Move node_types data into package (data/ -> src/codeweaver/data/)
- Update node_type_parser.py to use importlib.resources
- Add pickle cache support for pre-processed node types
- Create build scripts:
  - scripts/build/generate-schema.py (schema generation)
  - scripts/build/preprocess-node-types.py (cache generation)
  - scripts/build/prepare-build.py (orchestrator)
- Remove problematic save_schema() from settings.py
- Fix git version detection in __init__.py (remove invalid cwd)
- Update pyproject.toml to include src/codeweaver/data/

This fixes the critical bug where relative paths like
Path(__file__).parent.parent.parent.parent broke when the package
was flattened during build (src/codeweaver/ becomes /).

The node_types are now:
1. Located inside the package at src/codeweaver/data/node_types/
2. Pre-processed during build and cached as pickle
3. Loaded from cache at runtime via importlib.resources
4. Still support dynamic loading for future user-defined grammars
Creates a new mise task that automates the version release workflow:
- Runs prepare-build.py to generate all build artifacts
- Updates CHANGELOG.md with git-cliff
- Commits build artifacts and changelog
- Creates git tag
- Optionally pushes to remote

Usage:
  mise run prepare-version           # Interactive - prompts for version
  mise run prepare-version 1.2.0     # Specify version
  mise run prepare-version 1.2.0 --push  # Auto-push after tagging
  mise run prepare-version --no-commit    # Just generate artifacts
  mise run prepare-version --no-tag       # Commit but don't tag

This consolidates the release preparation process into a single
command, ensuring all build artifacts are generated correctly
before each release.
Copilot AI review requested due to automatic review settings December 2, 2025 19:45
@bashandbone bashandbone added bug Something isn't working high-priority labels Dec 2, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 2, 2025

Reviewer's Guide

Refactors how node type data and schemas are packaged and loaded so builds no longer depend on fragile relative paths, adds a precomputed node-types cache for faster startup, and introduces scripted build + release preparation (including schema generation) wired into a new mise prepare-version task.

Sequence diagram for node type parser initialization and loading with cache

sequenceDiagram
    actor Runtime
    participant NodeTypeParser
    participant NodeTypeFileLoader
    participant PackageData as PackageResources_codeweaver_data

    Runtime->>NodeTypeParser: __init__(languages, use_cache)
    NodeTypeParser->>NodeTypeParser: set _languages
    NodeTypeParser->>NodeTypeFileLoader: create NodeTypeFileLoader(directory=None)
    NodeTypeParser->>NodeTypeParser: set _use_cache
    alt use_cache is true and _cache_loaded is false
        NodeTypeParser->>NodeTypeParser: _load_from_cache()
        NodeTypeParser->>PackageData: files("codeweaver.data")/"node_types_cache.pkl"
        alt cache file exists and is valid
            PackageData-->>NodeTypeParser: cache bytes
            NodeTypeParser->>NodeTypeParser: pickle.loads(bytes)
            NodeTypeParser->>NodeTypeParser: set _registration_cache
            NodeTypeParser->>NodeTypeParser: set _cache_loaded True
        else cache missing or invalid
            PackageData-->>NodeTypeParser: error or missing
            NodeTypeParser->>NodeTypeParser: log warning and fall back
        end
    else use_cache is false or _cache_loaded already true
        NodeTypeParser-->>Runtime: skip cache loading
    end

    Note over Runtime,NodeTypeParser: Later, when parsing without cache file
    Runtime->>NodeTypeParser: parse_all_nodes()
    NodeTypeParser->>NodeTypeFileLoader: load_all_files()
    NodeTypeFileLoader->>PackageData: files("codeweaver.data")/"node_types"
    PackageData-->>NodeTypeFileLoader: directory listing of *"node-types.json"
    NodeTypeFileLoader-->>NodeTypeParser: list of Paths
    NodeTypeParser->>NodeTypeParser: populate _registration_cache from JSON
Loading

Class diagram for updated node type loading and caching

classDiagram
    class SemanticSearchLanguage {
    }

    class NodeTypeFileLoader {
        +DirectoryPath~None directory
        +list~Path~ files
        +list~NodeArray~ _nodes
        +NodeTypeFileLoader(directory DirectoryPath~None, files list~Path~None)
        +_load_file(language SemanticSearchLanguage) list~dict~str, Any~~None
    }

    class NodeTypeParser {
        +frozenset~SemanticSearchLanguage~ _languages
        +NodeTypeFileLoader _loader
        +bool _use_cache
        +dict _registration_cache
        +bool _cache_loaded
        +NodeTypeParser(languages Sequence~SemanticSearchLanguage~None, use_cache bool)
        +_load_from_cache() bool
        +parse_all_nodes() list
        +nodes list~NodeArray~
    }

    NodeTypeParser *-- NodeTypeFileLoader : uses
    NodeTypeParser ..> SemanticSearchLanguage : registers
    NodeTypeFileLoader ..> SemanticSearchLanguage : loads files for
Loading

File-Level Changes

Change Details Files
Switch node type loading from filesystem-relative paths to importlib package resources, with optional directory overrides.
  • Update node types discovery to default to files("codeweaver.data")/"node_types" when no directory override is provided.
  • Change NodeTypeFileLoader to allow a None directory meaning package resources, and adjust file list types to plain pathlib.Path.
  • Add special handling in NodeTypeFileLoader._load_file to read node type JSON directly from package resources when directory is None.
src/codeweaver/semantic/node_type_parser.py
Add a pickle-based cache for preprocessed node types and load it at runtime when available to speed startup.
  • Introduce NodeTypeParser._cache_loaded class flag and a _load_from_cache helper that reads node_types_cache.pkl from the codeweaver.data package using importlib.resources.files.
  • Wire cache loading into NodeTypeParser.init via a use_cache flag, defaulting to True, and fall back to JSON parsing when cache is missing or invalid.
  • Create a build-time script to parse all node-type JSON files, collect Things and registration cache, and write node_types_cache.pkl into src/codeweaver/data.
src/codeweaver/semantic/node_type_parser.py
scripts/build/preprocess-node-types.py
src/codeweaver/data/__init__.py
Introduce a centralized build preparation workflow and schema-generation process, then expose it via a mise prepare-version task.
  • Add scripts/build/prepare-build.py to orchestrate generating supported languages, provider lists, updated node-types from grammars, the node-types cache, and settings schema.
  • Add scripts/build/generate-schema.py that writes a versioned JSON schema under schema/v/codeweaver.schema.json, guarded to skip if the file already exists.
  • Add a mise.toml [tasks.prepare-version] definition that runs the build preparation, regenerates CHANGELOG via git-cliff, optionally commits artifacts, tags the release, and can push to remote.
scripts/build/prepare-build.py
scripts/build/generate-schema.py
mise.toml
Adjust packaging configuration so data and build artifacts are included from within the package rather than via top-level node_types directories.
  • Stop including top-level node_types/** in wheel and sdist configuration; rely on src/codeweaver/data instead.
  • Ensure sdist includes schema/** and scripts/** so build-time tools and generated schemas are shipped with releases.
  • Create the codeweaver.data package module to house node types and cache data for importlib.resources consumption.
pyproject.toml
src/codeweaver/data/__init__.py
Simplify runtime version resolution and decouple it from repo-relative paths; clarify schema generation responsibilities in settings.
  • Change get_version() to run git describe via the resolved git executable without forcing a repo-root cwd and to tolerate failures by returning a default "0.0.0".
  • Remove the save_schema() method from CodeWeaverSettings and update json_schema() docstring to point callers to the new scripts/build/generate-schema.py for build-time schema generation.
src/codeweaver/__init__.py
src/codeweaver/config/settings.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

✅ CLA Check Passed

All contributors to this PR are exempt from the CLA requirement:

  • ✅ claude (bot)

Knitli organization members and recognized bots do not need to sign the CLA.

Thanks for your contribution to codeweaver! 🧵

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

Blocking issues:

  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • In preprocess-node-types.py you're reaching into the private NodeTypeParser._registration_cache; consider exposing a public method or property to retrieve the registration cache so the build script isn't coupled to an internal attribute that may change.
  • The pickle written by preprocess-node-types.py stores both things and registration_cache, but _load_from_cache only consumes registration_cache; either load and reuse the precomputed things as well or drop them from the cache payload to avoid unnecessary file size and confusion about what is actually used.
  • In get_version() the code now silently falls back to "0.0.0" when git describe fails; consider logging a debug or warning message in that branch so unexpected version resolution failures are visible during debugging.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `preprocess-node-types.py` you're reaching into the private `NodeTypeParser._registration_cache`; consider exposing a public method or property to retrieve the registration cache so the build script isn't coupled to an internal attribute that may change.
- The pickle written by `preprocess-node-types.py` stores both `things` and `registration_cache`, but `_load_from_cache` only consumes `registration_cache`; either load and reuse the precomputed `things` as well or drop them from the cache payload to avoid unnecessary file size and confusion about what is actually used.
- In `get_version()` the code now silently falls back to `"0.0.0"` when `git describe` fails; consider logging a debug or warning message in that branch so unexpected version resolution failures are visible during debugging.

## Individual Comments

### Comment 1
<location> `scripts/build/preprocess-node-types.py:36-42` </location>
<code_context>
+    print(f"  Parsed {len(all_things)} Things/Categories across all languages")
+
+    # Get the cache from the parser's internal registry
+    cache_data = {
+        "things": all_things,
+        "registration_cache": parser._registration_cache,
+    }
+
</code_context>

<issue_to_address>
**suggestion (performance):** Consider whether storing `things` in the cache is necessary given the current load path.

Right now the cache includes both `"things"` and `"registration_cache"`, but `_load_from_cache` only reads `registration_cache`. If `things` isn’t used elsewhere, you can remove it from the cache to shrink the pickle and avoid serializing large data unnecessarily.

If you expect to load `things` from the cache later, consider adding a comment or assertion in `_load_from_cache` to document that contract and keep the cache shape intentional.

```suggestion
    print(f"  Parsed {len(all_things)} Things/Categories across all languages")

    # Only persist the parser's registration cache. The Thing/Category objects
    # are reconstructed from source at runtime and are not needed in the cache.
    cache_data = {
        "registration_cache": parser._registration_cache,
    }
```
</issue_to_address>

### Comment 2
<location> `src/codeweaver/__init__.py:54-59` </location>
<code_context>
                    git_describe = subprocess.run(
                        [git, "describe", "--tags", "--always", "--dirty"],
                        capture_output=True,
                        text=True,
                        check=False,
                    )
</code_context>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Comment 3
<location> `src/codeweaver/semantic/node_type_parser.py:466-469` </location>
<code_context>
    def _load_file(self, language: SemanticSearchLanguage) -> list[dict[str, Any]] | None:
        """Load a single node types file for a specific language."""
        if language == SemanticSearchLanguage.JSX:
            language = SemanticSearchLanguage.JAVASCRIPT

        # If using package resources, load directly
        if self.directory is None:
            data_dir = files("codeweaver.data") / "node_types"
            filename = f"{language.value}-node-types.json"
            resource = data_dir / filename
            if resource.is_file():
                return from_json(resource.read_bytes())
            return None

        # Otherwise use file path
        file_path = next(
            (
                file
                for file in self.files
                if SemanticSearchLanguage.from_string(file.stem.replace("-node-types", ""))
                == language
            ),
            None,
        )
        if file_path and file_path.exists():
            return from_json(file_path.read_bytes())
        return None

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
            return from_json(resource.read_bytes()) if resource.is_file() else None
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 54 to 59
git_describe = subprocess.run(
["describe", "--tags", "--always", "--dirty"], # noqa: S607
executable=git,
[git, "describe", "--tags", "--always", "--dirty"],
capture_output=True,
text=True,
check=True,
cwd=str(Path(__file__).parent.parent.parent),
check=False,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes critical relative path issues that broke the package when installed via pip/uv by moving data files into the package structure and using importlib.resources for resource loading. Key improvements include adding a pickle cache for performance and implementing proper resource access patterns.

Key Changes

  • Replaced hardcoded relative paths with importlib.resources.files() for robust package resource access
  • Added pickle cache support for pre-processed node types (performance optimization)
  • Made directory parameter optional in NodeTypeFileLoader (defaults to package resources)

Reviewed changes

Copilot reviewed 9 out of 43 changed files in this pull request and generated 2 comments.

File Description
src/codeweaver/semantic/node_type_parser.py Updated to use importlib.resources for loading node types, added pickle cache support, made directory parameter optional
src/codeweaver/data/node_types/*.json Added node type definition files for yaml, typescript, solidity, and json to package data
src/codeweaver/data/node_types/analysis/*.license Added license files for analysis data
src/codeweaver/data/init.py Added package init file for data directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 461 to 468
# If using package resources, load directly
if self.directory is None:
data_dir = files("codeweaver.data") / "node_types"
filename = f"{language.value}-node-types.json"
resource = data_dir / filename
if resource.is_file():
return from_json(resource.read_bytes())
return None
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _load_file method has inconsistent return behavior. When using package resources (line 467), it returns from_json(resource.read_bytes()) or None, but when using file paths (line 471), it returns the result of from_json() which may not be None for missing files. The None return should be explicitly documented in the docstring, and the return type annotation should be updated to reflect that this method can return None.

Copilot uses AI. Check for mistakes.
Comment on lines +547 to 555
def __init__(
self, languages: Sequence[SemanticSearchLanguage] | None = None, use_cache: bool = True
) -> None:
"""Initialize NodeTypeParser with an optional NodeTypeFileLoader.

Args:
languages: Optional pre-loaded list of languages to parse; if None, will load all available languages.
use_cache: Whether to try loading from the pre-built cache. Default True.
"""
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_cache parameter should be documented in the class docstring, not just in the __init__ method. Additionally, consider documenting what happens when the cache fails to load (falls back to parsing from JSON files).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR successfully addresses the critical packaging bug and introduces excellent build infrastructure. However, there are two important issues that should be fixed before merging.

🚨 Key Issues

1. Remove Unused Cache Data (scripts/build/preprocess-node-types.py:39-42)

Sourcery correctly identified that "things" is stored in the cache but never loaded at runtime (see node_type_parser.py:584). This wastes disk space and creates confusion.

Recommendation: Remove it from the cache:

# Only persist the parser's registration cache
cache_data = {
    "registration_cache": parser._registration_cache,
}

2. Decouple Build Script from Private Attributes

The build script accesses parser._registration_cache (private attribute). Consider adding a public property to NodeTypeParser:

@property
def registration_cache(self) -> dict[SemanticSearchLanguage, _ThingCacheDict]:
    """Get the registration cache for serialization."""
    return self._registration_cache

🛡️ Security Review

The subprocess call flagged by Sourcery in src/codeweaver/__init__.py:54 is a false positive. The git path from shutil.which() is OS-controlled, not user input, so there's no command injection risk. You can safely ignore this warning or add a clarifying comment.

✅ What's Good

  • Excellent problem identification and solution approach
  • Clean build infrastructure with proper orchestration
  • Performance improvements via pickle caching
  • Follows Python packaging best practices
  • Good error handling and user feedback

📝 Minor Suggestions

  1. Add debug logging when git version detection fails (__init__.py:60-65)
  2. Consider Sourcery's style suggestion for node_type_parser.py:466-469 (minor readability improvement)

🧪 Testing

Wait for the test suite (Python 3.12 & 3.13) to complete before merging.

Recommendation

Request Changes for the two issues above. Once fixed, this will be ready to merge! Great work on solving a complex packaging problem. 🚀

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 2, 2025 19:55
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

👋 Hey @bashandbone,

Thanks for your contribution to codeweaver! 🧵

You need to agree to the CLA first... 🖊️

Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA).

To agree to the CLA, please comment:

I read the contributors license agreement and I agree to it.

Those exact words are important1, so please don't change them. 😉

You can read the full CLA here: Contributor License Agreement


@bashandbone has signed the CLA.


0 out of 2 committers have signed the CLA.
@claude
@bashandbone
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Footnotes

  1. Our bot needs those exact words to recognize that you agree to the CLA.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 43 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

logger.debug("Node types cache not found, will parse from JSON files")
return False

cache_data = pickle.loads(cache_resource.read_bytes())
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pickle.loads() call on line 581 loads untrusted data from package resources without any validation. While this is from package resources and presumably safe during development, a compromised package or build process could inject malicious pickle data. Consider:

  1. Adding integrity checks (e.g., HMAC signature verification)
  2. Using a safer serialization format like JSON
  3. At minimum, wrapping in a try-except with specific exception handling

This is especially important since the cache is loaded by default (use_cache=True).

Copilot uses AI. Check for mistakes.
Comment on lines 587 to 589
except Exception as e:
logger.warning("Failed to load node types cache, will parse from JSON: %s", e)
return False
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling on line 587 is too broad. Catching all exceptions with except Exception as e may hide important errors like import errors, attribute errors, or corruption issues. Consider:

  1. Catching specific exceptions (e.g., pickle.UnpicklingError, FileNotFoundError)
  2. Re-raising critical errors that shouldn't be silently ignored
  3. Using different log levels for different error types (e.g., logger.error for unexpected failures vs logger.debug for missing cache)

Copilot uses AI. Check for mistakes.
- Remove unused 'things' from cache data (only registration_cache needed)
- Add public registration_cache property to NodeTypeParser
- Avoid accessing private attributes from build scripts

This reduces cache file size and follows better encapsulation practices.
- Add validation checks for cache structure and data types
- Replace broad exception handler with specific exception types:
  - pickle.UnpicklingError, AttributeError, KeyError (data corruption)
  - FileNotFoundError, OSError (filesystem errors)
- Add security documentation explaining pickle trust model
- Provide more specific error messages for debugging

This makes cache loading more robust and prevents silent failures
from masking important errors like import or corruption issues.
Copilot AI review requested due to automatic review settings December 2, 2025 20:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 9 out of 43 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bashandbone bashandbone merged commit 8a5c536 into main Dec 2, 2025
19 of 24 checks passed
@bashandbone bashandbone deleted the claude/fix-relative-paths-01982YFFb4V8714SNxCMGiqq branch December 2, 2025 20:23
@github-actions github-actions bot locked and limited conversation to collaborators Dec 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug Something isn't working high-priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants