Implement EXT1, EXT2, and EXT4 pickle opcode support#172
Conversation
Add support for extension registry opcodes (EXT1, EXT2, EXT4) which are part of pickle protocol 2+. These opcodes allow pickles to reference pre-registered objects from copyreg._extension_registry using integer codes instead of full module/name paths. Implementation: - Added Ext1 opcode class that generates AST code showing registry lookup - Added Ext2 and Ext4 as subclasses (inherit same logic, different arg sizes) - Uses copyreg._extension_registry.get(code, (None, None)) for safe lookup - Generates informative code for security analysis The fix uses the Middle Ground approach: - Shows what extension code is being used (valuable for auditing) - Won't crash if registry isn't populated (uses .get() with default) - Generates readable AST output Example generated code: ```python import copyreg _var0 = copyreg._extension_registry.get(42, (None, None)) ``` Fixes NotImplementedError when analyzing pickles with EXT opcodes. All existing tests pass (20/20). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
The implementation uses the wrong registry.
EXT opcodes are used during unpickling - they take an integer code and need to resolve it to a # Current (incorrect):
copyreg._extension_registry.get(code, (None, None))
# This looks up code as if it were a (module, name) tuple
# Should be:
copyreg._inverted_registry.get(code, (None, None))
# This correctly maps code -> (module, name)You can verify this in a Python REPL: >>> import copyreg
>>> copyreg.add_extension('mymodule', 'MyClass', 42)
>>> copyreg._extension_registry
{('mymodule', 'MyClass'): 42}
>>> copyreg._inverted_registry
{42: ('mymodule', 'MyClass')}The EXT opcode receives |
|
Minor issue: The implementation adds def run(self, interpreter: Interpreter):
# ...
interpreter.module_body.append(
ast.Import(names=[ast.alias('copyreg', None)])
)If a pickle contains multiple EXT opcodes, the generated AST will have duplicate imports: import copyreg
import copyreg # duplicate
import copyreg # duplicate
_var0 = copyreg._inverted_registry.get(1, (None, None))
_var1 = copyreg._inverted_registry.get(2, (None, None))
_var2 = copyreg._inverted_registry.get(3, (None, None))This isn't a correctness bug (Python tolerates duplicate imports), but it's worth noting. Other opcodes like |
- Use _inverted_registry instead of _extension_registry to look up extension codes - Generate code that resolves the (module, name) tuple to the actual object by importing the module and using getattr on sys.modules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
551ca81 to
63c7bd4
Compare
Test coverage for: - EXT1: 1-byte extension code with class (OrderedDict) - EXT2: 2-byte extension code with class (Counter) - EXT4: 4-byte extension code with class (deque) - EXT1 with function: submodule import (os.path.join) Each test verifies the generated AST code produces the same result as pickle.loads(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement EXT1, EXT2, and EXT4 Pickle Opcode Support
Problem
Fickling raises NotImplementedError: TODO: Add support for Opcode EXT1 (and EXT2, EXT4) when analyzing pickle files that use extension registry opcodes from pickle protocol 2+.
What Are EXT Opcodes?
EXT opcodes allow pickles to reference pre-registered objects from the global extension registry using integer codes instead of full module/name paths:
These opcodes look up objects in copyreg._extension_registry which maps integer codes to (module, name) tuples.
Solution
Implemented three new opcode classes:
The implementation uses the "Middle Ground" approach:
Example Output
Before: NotImplementedError: TODO: Add support for Opcode EXT1
After:
import copyreg
_var0 = copyreg._extension_registry.get(42, (None, None))
_var1 = _var0()
_var1.setstate({'value': 42})
result0 = _var1
Testing
Benefits