Skip to content

Commit 818a0ec

Browse files
dguidoclaudethomas-chauchefoin-tob
authored
Add graceful degradation and scan API for robust scanning (#218)
* Add graceful degradation and scan API for robust scanning - Add ScanResult class with is_safe, severity, results, and errors - Add scan_file() for graceful single-file scanning - Add scan_archive() for scanning ZIP archives - Add RelaxedZipFile class that ignores CRC validation errors - Add _scan_bytes() helper for in-memory scanning - Export new functions from package This API provides picklescan-like graceful degradation, continuing to scan even when individual files fail to parse. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix scan API: broken RelaxedZipFile, unsafe ScanResult, missing tests - Replace RelaxedZipFile CRC=0 hack (caused BadZipFile on valid files) with _expected_crc=None on the returned file handle - Make ScanResult.is_safe a computed property instead of a stored bool; __bool__ now returns False when errors exist (incomplete scan != safe) - Escalate severity in _scan_bytes when check_safety() or outer parse exceptions occur in graceful mode - Deduplicate scan_file by delegating to _scan_bytes - Change graceful default to False for scan_file and scan_archive - Remove dead code: data-is-None branch, is_safe kwarg, read() override - Add 12 tests covering scan_file, scan_archive, ScanResult, RelaxedZipFile Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address review: move RelaxedZipFile, fix error handling, add tests - Move RelaxedZipFile from polyglot.py to loader.py to avoid requiring numpy/torch for scan_archive (RelaxedZipFile only needs stdlib zipfile) - Add RuntimeWarning when _expected_crc attribute is missing on future Python versions so CRC bypass degradation is not silent - Widen scan_archive outer exception handler to catch OSError (covers FileNotFoundError, PermissionError) in graceful mode, matching scan_file - Separate archive.read() try/except from _scan_bytes() call so the inner catch only handles I/O errors, not analysis-layer exceptions - Escalate _scan_bytes outer except Exception to LIKELY_UNSAFE (was SUSPICIOUS — for a security scanner, unanalyzable files should be treated as dangerous) - Include exception type in all error messages for debuggability - Add 8 new tests: archive nonexistent graceful/non-graceful, bad ZIP non-graceful raises, mixed good/bad archive members, all pickle extensions, is_safe boundary at POSSIBLY_UNSAFE, analysis error severity escalation, RelaxedZipFile CRC bypass regression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix Ruff linting failure * Minor improvements to scan API - Narrow file-open exception catch from Exception to OSError - Rename scan_archive to scan_zip_archive to clarify ZIP-only scope - Use PurePosixPath.suffix for extension parsing - Remove .pt/.pth from extension filter (these are ZIP containers, not raw pickles) - Update docstring to reference fickling.polyglot for other archive formats Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix test_scans_all_pickle_extensions --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Thomas Chauchefoin <thomas.chauchefoin@trailofbits.com>
1 parent e5e34bb commit 818a0ec

File tree

3 files changed

+478
-3
lines changed

3 files changed

+478
-3
lines changed

fickling/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# fmt: off
2-
from .loader import load, loads #noqa
2+
from .loader import load, loads, scan_file, scan_zip_archive, ScanResult # noqa
33
from .context import check_safety #noqa
44
from .hook import always_check_safety, activate_safe_ml_environment #noqa
55
from .analysis import is_likely_safe # noqa

fickling/loader.py

Lines changed: 198 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,38 @@
11
import pickle
2+
import warnings
3+
import zipfile
24
from io import BytesIO
5+
from pathlib import PurePosixPath
36

4-
from fickling.analysis import Severity, check_safety
7+
from fickling.analysis import AnalysisResults, Severity, check_safety
58
from fickling.exception import UnsafeFileError
6-
from fickling.fickle import Pickled
9+
from fickling.fickle import Pickled, PickleDecodeError, StackedPickle
10+
11+
12+
class RelaxedZipFile(zipfile.ZipFile):
13+
"""A ZipFile subclass that ignores CRC validation errors.
14+
15+
Matches PyTorch's lenient ZIP parsing behavior. Uses CPython's
16+
internal _expected_crc attribute on ZipExtFile — guarded by hasattr
17+
so it degrades to standard CRC-checked behavior if the attribute
18+
is renamed in a future Python version.
19+
"""
20+
21+
def open(self, name, mode="r", pwd=None, *, force_zip64=False):
22+
"""Open a member with CRC validation disabled."""
23+
f = super().open(name, mode, pwd, force_zip64=force_zip64)
24+
if hasattr(f, "_expected_crc"):
25+
f._expected_crc = None
26+
else:
27+
warnings.warn(
28+
"RelaxedZipFile: _expected_crc not found on ZipExtFile. "
29+
"CRC validation is still enabled. This may cause failures "
30+
"scanning PyTorch model files with CRC mismatches.",
31+
RuntimeWarning,
32+
stacklevel=2,
33+
)
34+
return f
35+
736

837
# Save the original pickle.loads before any hooks can replace it.
938
# loader.load() must use the real pickle.loads for final deserialization,
@@ -75,3 +104,170 @@ def loads(
75104
json_output_path=json_output_path,
76105
**kwargs,
77106
)
107+
108+
109+
class ScanResult:
110+
"""Result of scanning a file or archive for malicious pickle content."""
111+
112+
def __init__(
113+
self,
114+
filepath: str,
115+
severity: Severity,
116+
results: list[AnalysisResults],
117+
errors: list[str],
118+
):
119+
self.filepath = filepath
120+
self.severity = severity
121+
self.results = results
122+
self.errors = errors
123+
124+
@property
125+
def is_safe(self) -> bool:
126+
return self.severity <= Severity.LIKELY_SAFE
127+
128+
def __bool__(self) -> bool:
129+
return self.is_safe and not self.errors
130+
131+
def __repr__(self) -> str:
132+
return (
133+
f"ScanResult(filepath={self.filepath!r}, "
134+
f"severity={self.severity.name}, "
135+
f"results={len(self.results)}, errors={len(self.errors)})"
136+
)
137+
138+
139+
def scan_file(
140+
filepath: str,
141+
graceful: bool = False,
142+
json_output_path: str | None = None,
143+
) -> ScanResult:
144+
"""Scan a file for malicious pickle content.
145+
146+
Args:
147+
filepath: Path to the file to scan
148+
graceful: If True, continue on parse errors and report them.
149+
If False, raise exceptions on parse errors.
150+
json_output_path: Optional path to write JSON analysis results
151+
152+
Returns:
153+
ScanResult with severity, results list, and errors list
154+
"""
155+
try:
156+
with open(filepath, "rb") as f:
157+
data = f.read()
158+
except OSError as e:
159+
if graceful:
160+
return ScanResult(
161+
filepath=filepath,
162+
severity=Severity.SUSPICIOUS,
163+
results=[],
164+
errors=[f"File error ({type(e).__name__}): {e!s}"],
165+
)
166+
raise
167+
return _scan_bytes(filepath, data, graceful, json_output_path)
168+
169+
170+
def scan_zip_archive(
171+
filepath: str,
172+
graceful: bool = False,
173+
json_output_path: str | None = None,
174+
) -> dict[str, ScanResult]:
175+
"""Scan a ZIP archive for malicious raw pickle content.
176+
177+
Scans each file within the archive that has a raw pickle-related extension
178+
(.pkl, .pickle, .bin). Only ZIP archives are supported;
179+
for TAR or 7z archives, see fickling.polyglot.
180+
181+
Args:
182+
filepath: Path to the ZIP archive to scan
183+
graceful: If True, continue on parse errors and report them
184+
json_output_path: Optional path to write JSON analysis results
185+
186+
Returns:
187+
Dict mapping archive member names to their ScanResults
188+
"""
189+
results: dict[str, ScanResult] = {}
190+
pickle_extensions = {".pkl", ".pickle", ".bin"}
191+
192+
try:
193+
with RelaxedZipFile(filepath, "r") as archive:
194+
for info in archive.infolist():
195+
if info.is_dir():
196+
continue
197+
ext = PurePosixPath(info.filename).suffix.lower()
198+
if ext not in pickle_extensions:
199+
continue
200+
201+
try:
202+
data = archive.read(info)
203+
except (zipfile.BadZipFile, OSError) as e:
204+
if graceful:
205+
results[info.filename] = ScanResult(
206+
filepath=info.filename,
207+
severity=Severity.SUSPICIOUS,
208+
results=[],
209+
errors=[f"Read error ({type(e).__name__}): {e!s}"],
210+
)
211+
continue
212+
raise
213+
214+
results[info.filename] = _scan_bytes(
215+
info.filename, data, graceful, json_output_path
216+
)
217+
except (zipfile.BadZipFile, OSError) as e:
218+
if not graceful:
219+
raise
220+
results["<archive>"] = ScanResult(
221+
filepath=filepath,
222+
severity=Severity.SUSPICIOUS,
223+
results=[],
224+
errors=[f"Archive error ({type(e).__name__}): {e!s}"],
225+
)
226+
227+
return results
228+
229+
230+
def _scan_bytes(
231+
name: str,
232+
data: bytes,
233+
graceful: bool,
234+
json_output_path: str | None,
235+
) -> ScanResult:
236+
"""Scan bytes data for malicious pickle content."""
237+
results: list[AnalysisResults] = []
238+
errors: list[str] = []
239+
overall_severity = Severity.LIKELY_SAFE
240+
241+
try:
242+
stacked = StackedPickle.load(BytesIO(data), fail_on_decode_error=not graceful)
243+
for pickled in stacked:
244+
try:
245+
result = check_safety(pickled, json_output_path=json_output_path)
246+
results.append(result)
247+
if result.severity > overall_severity:
248+
overall_severity = result.severity
249+
except Exception as e:
250+
if graceful:
251+
errors.append(f"Analysis error ({type(e).__name__}): {e!s}")
252+
overall_severity = max(overall_severity, Severity.LIKELY_UNSAFE)
253+
else:
254+
raise
255+
except PickleDecodeError as e:
256+
if graceful:
257+
errors.append(f"Parse error ({type(e).__name__}): {e!s}")
258+
overall_severity = max(overall_severity, Severity.SUSPICIOUS)
259+
else:
260+
raise
261+
except Exception as e:
262+
if graceful:
263+
errors.append(f"Unexpected error ({type(e).__name__}): {e!s}")
264+
overall_severity = max(overall_severity, Severity.LIKELY_UNSAFE)
265+
else:
266+
raise
267+
268+
return ScanResult(
269+
filepath=name,
270+
severity=overall_severity,
271+
results=results,
272+
errors=errors,
273+
)

0 commit comments

Comments
 (0)