Conversation
…phism This refactor introduces a clean inheritance architecture using proper Python polymorphism: Architecture: - Add Source base class in dedicated source.py file (common metadata/extra fields) - Refactor FileSystemNode to inherit from Source with full backward compatibility - Create specialized classes: FileSystemFile, FileSystemDirectory, FileSystemSymlink, GitRepository - render_tree() method belongs to FileSystemNode level (tree-specific, not all sources need it) Pure Python Polymorphism: - Each subclass implements its own get_sort_priority() and get_content() methods - NO type property or enum needed - use isinstance() directly - FileSystemFile.get_sort_priority() returns 0 (files first) - FileSystemDirectory.get_content() raises ValueError (directories can't have content) - FileSystemSymlink.get_content() returns target path (what symlink points to) - Clean, extensible design following Python best practices Removed Legacy Type System: - Completely removed FileSystemNodeType enum - No more type property - use isinstance() everywhere - Constructors now use specific classes: FileSystemFile(), FileSystemDirectory(), etc. - Pure polymorphism without any type checking properties Code Changes: - src/gitingest/schemas/source.py: New base Source class - src/gitingest/schemas/filesystem.py: Refactored with polymorphic methods, Path import in TYPE_CHECKING - src/gitingest/ingestion.py: Use specific constructors, populate symlink targets - src/gitingest/output_formatter.py: Use isinstance() instead of enum comparisons - Remove all FileSystemNodeType imports and usage - All pre-commit hooks pass (ruff-check, ruff-format, etc.) Benefits: - True Python polymorphism where each class knows its own behavior - No explicit type checking needed - Python dispatches automatically - More extensible - adding new source types just requires implementing methods - Cleaner code without enum/string type comparisons - Full backward compatibility maintained
Replace all isinstance() type checking with proper polymorphic methods: Pure Polymorphism Methods: - get_summary_info(): Each class returns its own summary format - is_single_file(): Boolean check without isinstance() - gather_contents(): Recursive content gathering via method dispatch - get_display_name(): Tree display formatting (/, -> target, etc.) - has_children(): Check for child nodes without type checking Benefits: - No isinstance() 'clochard' style code anywhere - True duck typing - just call methods and let Python dispatch - Cleaner, more maintainable code - Each class encapsulates its own behavior - Easy to extend with new node types Code Changes: - FileSystemNode: Base implementations for all methods - FileSystemFile: is_single_file()=True, summary with line count - FileSystemDirectory: get_display_name() adds '/', has_children() checks list - FileSystemSymlink: get_display_name() shows '-> target' - output_formatter.py: Use polymorphic methods instead of isinstance() This is proper OOP - objects know their own behavior!
217afe2 to
87cbb5a
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a Source base class and refactors the filesystem schema to use proper inheritance and polymorphism. The refactoring eliminates the enum-based type system in favor of specialized classes for different filesystem node types.
Key changes:
- Introduces a new
Sourcebase class with common fields (metadataandextra) - Replaces
FileSystemNodeTypeenum with specialized classes (FileSystemFile,FileSystemDirectory,FileSystemSymlink,GitRepository) - Moves type-specific behavior into polymorphic methods on the respective classes
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/gitingest/schemas/source.py |
New base class with common metadata fields |
src/gitingest/schemas/filesystem.py |
Major refactor replacing enum-based typing with inheritance hierarchy |
src/gitingest/schemas/__init__.py |
Updated exports to include new classes |
src/gitingest/output_formatter.py |
Updated to use polymorphic methods instead of type checking |
src/gitingest/ingestion.py |
Updated to instantiate specific node types instead of generic nodes |
.vscode/launch.json |
Added reload flag to debug configuration |
src/gitingest/schemas/filesystem.py
Outdated
| """Return sort priority. Override in subclasses.""" | ||
| return 1 # Default: not a file | ||
|
|
There was a problem hiding this comment.
The comment '# Default: not a file' is unclear. Consider documenting the actual priority values and what they represent (e.g., '# Default priority for non-file nodes (directories, symlinks)').
| """Return sort priority. Override in subclasses.""" | |
| return 1 # Default: not a file | |
| """Return sort priority. Override in subclasses. | |
| Returns | |
| ------- | |
| int | |
| Sort priority value. By convention: | |
| 0 - file nodes (highest priority, e.g., README) | |
| 1 - default for non-file nodes (directories, symlinks, etc.) | |
| Subclasses should override as appropriate. | |
| """ | |
| return 1 # Default priority for non-file nodes (directories, symlinks, etc.) |
| from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk | ||
| from gitingest.utils.notebook import process_notebook | ||
|
|
There was a problem hiding this comment.
Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.
| from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk | |
| from gitingest.utils.notebook import process_notebook | |
| from gitingest.utils.notebook import process_notebook |
| from gitingest.utils.notebook import process_notebook | ||
|
|
There was a problem hiding this comment.
Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.
| from gitingest.utils.notebook import process_notebook |
src/gitingest/schemas/filesystem.py
Outdated
| return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n" | ||
|
|
||
| def is_single_file(self) -> bool: | ||
| """Files are single files.""" | ||
| return True | ||
|
|
||
| def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]: | ||
| """Render the tree representation of this file.""" | ||
| current_prefix = "└── " if is_last else "├── " | ||
| return [f"{prefix}{current_prefix}{self.name}"] | ||
|
|
||
|
|
There was a problem hiding this comment.
Calling self.content in summary info will trigger file reading, which could be expensive. Consider caching content or only reading when actually needed for display.
| return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n" | |
| def is_single_file(self) -> bool: | |
| """Files are single files.""" | |
| return True | |
| def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]: | |
| """Render the tree representation of this file.""" | |
| current_prefix = "└── " if is_last else "├── " | |
| return [f"{prefix}{current_prefix}{self.name}"] | |
| if self._cached_content is not None: | |
| line_count = len(self._cached_content.splitlines()) | |
| return f"File: {self.name}\nLines: {line_count:,}\n" | |
| else: | |
| return f"File: {self.name}\nLines: (not loaded)\n" | |
| def is_single_file(self) -> bool: | |
| """Files are single files.""" | |
| return True | |
| @property | |
| def content(self) -> str: | |
| """Return file content, caching after first read.""" | |
| if self._cached_content is None: | |
| self._cached_content = self.get_content() | |
| return self._cached_content |
src/gitingest/schemas/filesystem.py
Outdated
| current_prefix = "└── " if is_last else "├── " | ||
| display_name = self.name + "/" | ||
| lines.append(f"{prefix}{current_prefix}{display_name}") | ||
| if hasattr(self, "children") and self.children: |
There was a problem hiding this comment.
Using hasattr check for children is unnecessary since children is defined as a field with default factory in the base class. This check can be simplified to just if self.children:.
| if hasattr(self, "children") and self.children: | |
| if self.children: |
src/gitingest/schemas/filesystem.py
Outdated
| # Mark as git repo in the tree | ||
| display_name = f"{self.name}/ (git repository)" | ||
| lines.append(f"{prefix}{current_prefix}{display_name}") | ||
| if hasattr(self, "children") and self.children: |
There was a problem hiding this comment.
Using hasattr check for children is unnecessary since children is defined as a field with default factory in the base class. This check can be simplified to just if self.children:.
| if hasattr(self, "children") and self.children: | |
| if self.children: |
46c7f5d to
26ccecd
Compare
…thod Even simpler approach: - Replace node.has_children() with direct if node.children: - Remove unnecessary has_children() methods from all classes - Pythonic and direct - empty lists are falsy, non-empty are truthy - Less code, same functionality This is the most straightforward way to check for children in Python.
26ccecd to
280d3f0
Compare
|
Hi there! We haven’t seen activity on this pull request for 45 days, so I’m marking it as stale. |
|
Hi there! We haven’t heard anything for 10 days, so I’m closing this pull request. Feel free to reopen if you’d like to continue working on it. Thanks! |
No description provided.