Skip to content

Commit 320d644

Browse files
feat(heuristics): add whitespace check to detect excessive spacing and invisible characters for malware check (#1086)
This PR adds a new heuristic that analyzes code to detect suspicious use of excessive spaces and invisible characters. It checks whether the amount of spacing and invisible Unicode characters exceeds a defined threshold. Signed-off-by: Amine <[email protected]> Signed-off-by: Carl Flottmann <[email protected]> Co-authored-by: Carl Flottmann <[email protected]>
1 parent 9940f19 commit 320d644

File tree

4 files changed

+52
-1
lines changed

4 files changed

+52
-1
lines changed

src/macaron/resources/pypi_malware_rules/obfuscation.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,3 +311,14 @@ rules:
311311
- pattern: os.writev(...)
312312
- pattern: os.pwrite(...)
313313
- pattern: os.pwritev(...)
314+
315+
- id: obfuscation_excessive-spacing
316+
metadata:
317+
description: Detects the use of excessive spacing in code, which may indicate obfuscation or hidden code.
318+
message: Hidden code after excessive spacing
319+
languages:
320+
- python
321+
severity: ERROR
322+
patterns:
323+
- pattern-regex: '[\s]{50,}(\S)+' # The 50 here is the threshold for excessive spacing , more than that is considered obfuscation
324+
- pattern-not-regex: '"""[\s\S]*"""'
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Copyright (c) 2025 - 2025, Oracle and/or its affiliates. All rights reserved.
2+
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
3+
4+
"""
5+
Running this code will not produce any malicious behavior, but code isolation measures are
6+
in place for safety.
7+
"""
8+
9+
import sys
10+
11+
# ensure no symbols are exported so this code cannot accidentally be used
12+
__all__ = []
13+
sys.exit()
14+
15+
def test_function():
16+
"""
17+
All code to be tested will be defined inside this function, so it is all local to it. This is
18+
to isolate the code to be tested, as it exists to replicate the patterns present in malware
19+
samples.
20+
"""
21+
sys.exit()
22+
23+
# excessive spacing obfuscation
24+
def excessive_spacing_flow():
25+
print("Hello world!")

tests/malware_analyzer/pypi/resources/sourcecode_samples/obfuscation/expected_results.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,21 @@
229229
"end": 68
230230
}
231231
]
232+
},
233+
"src.macaron.resources.pypi_malware_rules.obfuscation_excessive-spacing": {
234+
"message": "Hidden code after excessive spacing",
235+
"detections": [
236+
{
237+
"file": "obfuscation/excessive_spacing.py",
238+
"start": 24,
239+
"end": 25
240+
},
241+
{
242+
"file": "obfuscation/inline_imports.py",
243+
"start": 27,
244+
"end": 27
245+
}
246+
]
232247
}
233248
},
234249
"disabled_sourcecode_rule_findings": {}

tests/malware_analyzer/pypi/resources/sourcecode_samples/obfuscation/inline_imports.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def test_function():
2424
__import__('builtins')
2525
__import__('subprocess')
2626
__import__('sys')
27-
__import__('os')
27+
print("Hello world!") ;__import__('os')
2828
__import__('zlib')
2929
__import__('marshal')
3030
# these both just import builtins

0 commit comments

Comments
 (0)