Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for sndhdr audio file tests and enhances text type detection with dynamic checks for version 2.0.0b5. The changes include significant refactoring of internal functions from private (underscore-prefixed) to public names, new scanner implementations, and improved test coverage.
Key Changes:
- Added new
sndhdr_scannerfor audio file detection - Enhanced text scanner with CSV detection and format-specific checks
- Improved Python scanner with better code pattern recognition
- Added JSON array support to json_scanner
- Refactored internal functions to use public naming
- Added new test cases for audio formats and dynamic text types
Reviewed changes
Copilot reviewed 13 out of 28 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_scanners.py | Added sndhdr scanner tests and updated text scanner assertions to use lowercase text type names |
| test/test_main.py | Removed deprecated imghdr compatibility tests |
| test/test_common_extensions.py | Updated function calls to match renamed public API |
| test/resources/system/test_list.json | Added test JSON file with array structure |
| test/resources/audio/*.mp3 | Added MP3 test files for audio detection |
| test/resources/audio/test.sndr | Added SNDR format test file |
| test/resources/audio/test.opus | Added Opus format test file |
| pyproject.toml | Disabled black and flake8 linting in favor of Ruff |
| puremagic/scanners/text_scanner.py | Major refactoring with CSV detection and dynamic format checks |
| puremagic/scanners/sndhdr_scanner.py | New scanner for audio formats |
| puremagic/scanners/python_scanner.py | Enhanced Python code detection with keyword/pattern matching |
| puremagic/scanners/json_scanner.py | Added support for JSON arrays |
| puremagic/main.py | Version bump, function renames, new scanner integration, error type changes |
| CHANGELOG.md | Updated with issue #85 reference |
| .gitignore | Added VS Code exclusions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # | ||
| # # Check for quotes that might indicate CSV | ||
| # has_quoted_fields = '"' in text and (f'"{best_delimiter}' in text or f'{best_delimiter}\"') | ||
| # | ||
|
|
There was a problem hiding this comment.
There are commented-out code blocks left in the production code. These should either be removed if they're no longer needed, or completed and uncommented if they serve a purpose. Leaving commented code in production reduces code quality and maintainability.
| # | |
| # # Check for quotes that might indicate CSV | |
| # has_quoted_fields = '"' in text and (f'"{best_delimiter}' in text or f'{best_delimiter}\"') | |
| # |
| result = catch_all_deep_scan(filename, head, foot) | ||
| except Exception: | ||
| pass | ||
| raise |
There was a problem hiding this comment.
The exception handling was changed to re-raise all exceptions instead of silently passing. This change may expose previously caught errors to users. While this can be beneficial for debugging, it's a change in behavior that could break existing code. The comment says 'pass' was replaced with 'raise', but this should be intentional and documented.
| raise | |
| # Intentionally suppress exceptions from catch-all deep scan to | |
| # maintain backward-compatible behavior and continue fallback logic. | |
| pass |
| filename: os.PathLike | str, | ||
| head=None, | ||
| foot=None, | ||
| confidence=0, |
There was a problem hiding this comment.
The function signature changed to add a new parameter 'confidence=0' but this change in the function signature could affect compatibility if this function is used elsewhere. Ensure that all call sites have been updated to handle this new parameter.
| for i, result in enumerate(matches): | ||
| if i == 0: | ||
| print("\n\tBest Match") | ||
| print("\n\tDeepscan Match" if int(result.confidence == 1) else "\n\tBest Match") |
There was a problem hiding this comment.
The condition 'int(result.confidence == 1)' will always evaluate to either 0 or 1 (since it's casting a boolean to int), and when used as a condition in an if statement, it would only be truthy when confidence is exactly 1. However, the logic seems inverted - you're checking if it should print "Deepscan Match" when confidence equals 1, but the condition will be 0 (falsy) when confidence is 1. This should likely be 'if result.confidence == 1.0:' instead.
| print("\n\tDeepscan Match" if int(result.confidence == 1) else "\n\tBest Match") | |
| print("\n\tDeepscan Match" if result.confidence == 1.0 else "\n\tBest Match") |
| if 1700 <= year_int <= 3000: | ||
| self.tags.append("ID3v1") | ||
| return True | ||
| except ValueError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except ValueError: | |
| except ValueError: | |
| # If the year string cannot be converted to an int, treat it as an invalid/missing year. | |
| # We intentionally ignore this and fall through to returning None below. |
| confidence=0.1, # Lower confidence due to simple format | ||
| ) | ||
| except (IndexError, struct.error): | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| pass | |
| # If the header is too short or malformed for an SNDR rate, treat it as | |
| # not matching SNDR and fall through to other format detection. | |
| return test_hcom(head) |
| try: | ||
| dialect = csv.Sniffer().sniff(text, delimiters=''.join(potential_delimiters)) | ||
| csv_sniffer_result = dialect.delimiter | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # csv.Sniffer can fail on non-standard or truncated input; in that case | |
| # we ignore the error and rely solely on the heuristic delimiter detection below. |
Add sndhdr tests #85
Add dynamic text type tests