Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 17 additions & 9 deletions src/macaron/malware_analyzer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,27 +65,35 @@ When a heuristic fails, with `HeuristicResult.FAIL`, then that is an indicator b
> ```
> The script will download the top 5000 PyPI packages and update the resource file automatically.

11. **Fake Email**
11. **Similar Projects**
- **Description**: Checks whether the maintainer(s) of the package have released other packages with close structural similarity.
- **Rule**: Return 'HeuristicResult.FAIL` upon finding the first similar package. Return `HeuristicResult.PASS` if no similar packages are found.
- **Dependency**: None

12. **Fake Email**
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email.
- **Rule**: Return `HeuristicResult.FAIL` if the email is invalid; otherwise, return `HeuristicResult.PASS`.
- **Dependency**: None.


12. **Minimal Content**
- **Description**: Checks if the package has a small number of files.
- **Rule**: Return `HeuristicResult.FAIL` if the number of files is strictly less than FILES_THRESHOLD; otherwise, return `HeuristicResult.PASS`.
13. **Type Stub File**
- **Description**: Checks if the package has a small number of `.pyi` stub files.
- **Rule**: Return `HeuristicResult.FAIL` if the number of `.pyi` files is strictly less than FILES_THRESHOLD; otherwise, return `HeuristicResult.PASS`.
- **Dependency**: None.

13. **Unsecure Description**
- **Description**: Checks if the package description is unsecure, such as not having a descriptive keywords that indicates its a stub package .
- **Rule**: Return `HeuristicResult.FAIL` if no descriptive word is found in the package description or summary ; otherwise, return `HeuristicResult.PASS`.
14. **Package Description Intent**
- **Description**: Checks if the package description contains keywords indicating it is a stub package or dependency confusion prevention placeholder.
- **Rule**: Return `HeuristicResult.FAIL` if no keyword is found in the package description or summary ; otherwise, return `HeuristicResult.PASS`.
- **Dependency**: None.

15. **Stub Name**
- **Description**: Checks if the package name contains the `"stub"` keyword, indicating that it is likely intended to be a stub package and not downloaded.
- **Rule**: Return `HeuristicResult.PASS` if the keywork `"stub"` is found in the package name; otherwise, return `HeuristicResult.FAIL`.

### Source Code Analysis with Semgrep
**PyPI Source Code Analyzer**
- **Description**: Uses Semgrep, with default rules written in `src/macaron/resources/pypi_malware_rules` and custom rules available by supplying a path to `custom_semgrep_rules` in `defaults.ini`, to scan the package `.tar` source code.
- **Rule**: If any Semgrep rule is triggered, the heuristic fails with `HeuristicResult.FAIL` and subsequently fails the package with `CheckResultType.FAILED`. If no rule is triggered, the heuristic passes with `HeuristicResult.PASS` and the `CheckResultType` result from the combination of all other heuristics is maintained.
- **Dependency**: Will be run if the Source Code Repo fails. This dependency can be bypassed by suppying `--force-analyze-source` in the CLI.
- **Dependency**: Will be run if the Source Code Repo fails. This dependency can be bypassed by supplying `--force-analyze-source` in the CLI.

This feature is currently a work in progress, and supports detection of code obfuscation techniques and remote exfiltration behaviors. It uses Semgrep OSS for detection. `defaults.ini` may be used to provide custom rules and exclude them:
- `disabled_default_rulesets`: supply to this a comma separated list of the names of default Semgrep rule files (excluding the `.yaml` extension) to disable all rule IDs in that file.
Expand Down
8 changes: 4 additions & 4 deletions src/macaron/malware_analyzer/pypi_heuristics/heuristics.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ class Heuristics(str, Enum):
#: Indicates that the package has a similar structure to other packages maintained by the same user.
SIMILAR_PROJECTS = "similar_projects"

#: Indicates that the package has minimal content.
MINIMAL_CONTENT = "minimal_content"
#: Indicates that the package has minimal .pyi type stub files.
TYPE_STUB_FILE = "type_stub_file"

#: Indicates that the package's description is unsecure, such as not having a descriptive keywords.
UNSECURE_DESCRIPTION = "unsecure_description"
#: Indicates from the package's description it is intended to be used as a stub or placeholder package.
PACKAGE_DESCRIPTION_INTENT = "package_description_intent"

#: Indicates that the package contains stub files.
STUB_NAME = "stub_name"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""This analyzer checks if a PyPI package has unsecure description."""
"""This analyzer checks if a PyPI package is a stub or placeholder package, using its description and summary."""

import logging
import re
Expand All @@ -15,17 +15,17 @@
logger: logging.Logger = logging.getLogger(__name__)


class UnsecureDescriptionAnalyzer(BaseHeuristicAnalyzer):
"""Check whether the package's description is unsecure."""
class PackageDescriptionIntentAnalyzer(BaseHeuristicAnalyzer):
"""Package description contains keywords indicating it is a stub package or dependency confusion prevention placeholder."""

SECURE_DESCRIPTION_REGEX = re.compile(
r"\b(?:internal|private|stub|placeholder|dependency confusion|security|namespace protection|reserved|harmless|prevent)\b",
r"\b(?:stub|placeholder|dependency confusion|security|namespace protection|reserved|prevent)\b",
re.IGNORECASE,
)

def __init__(self) -> None:
super().__init__(
name="unsecure_description_analyzer", heuristic=Heuristics.UNSECURE_DESCRIPTION, depends_on=None
name="package_description_intent", heuristic=Heuristics.PACKAGE_DESCRIPTION_INTENT, depends_on=None
)

def analyze(self, pypi_package_json: PyPIPackageJsonAsset) -> tuple[HeuristicResult, dict[str, JsonType]]:
Expand All @@ -52,5 +52,5 @@ def analyze(self, pypi_package_json: PyPIPackageJsonAsset) -> tuple[HeuristicRes
summary = json_extract(package_json, ["info", "summary"], str)
data = f"{description} {summary}"
if self.SECURE_DESCRIPTION_REGEX.search(data):
return HeuristicResult.PASS, {"message": "Package description is secure"}
return HeuristicResult.FAIL, {"message": "Package description is unsecure"}
return HeuristicResult.PASS, {"message": "Package description indicates a stub or placeholder package."}
return HeuristicResult.FAIL, {"message": "Package description does not indicate a stub or placeholder package."}
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ def __init__(self) -> None:
super().__init__(
name="similar_project_analyzer",
heuristic=Heuristics.SIMILAR_PROJECTS,
# TODO: these dependencies are used as this heuristic currently downloads many package sourcecode
# tarballs. Refactoring this heuristic to run more efficiently means this should have depends_on=None.
depends_on=[
(Heuristics.EMPTY_PROJECT_LINK, HeuristicResult.FAIL),
(Heuristics.ONE_RELEASE, HeuristicResult.FAIL),
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""This analyzer checks if a PyPI package has minimal content."""
"""This analyzer checks if a PyPI package has minimal .pyi stub content."""

import logging
import os
Expand All @@ -15,15 +15,15 @@
logger: logging.Logger = logging.getLogger(__name__)


class MinimalContentAnalyzer(BaseHeuristicAnalyzer):
"""Check whether the package has minimal content."""
class TypeStubFileAnalyzer(BaseHeuristicAnalyzer):
"""Check whether the package has minimal .pyi stub content."""

FILES_THRESHOLD = 10

def __init__(self) -> None:
super().__init__(
name="minimal_content_analyzer",
heuristic=Heuristics.MINIMAL_CONTENT,
name="type_stub_file",
heuristic=Heuristics.TYPE_STUB_FILE,
depends_on=None,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@
from macaron.malware_analyzer.pypi_heuristics.metadata.empty_project_link import EmptyProjectLinkAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.fake_email import FakeEmailAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency import HighReleaseFrequencyAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.minimal_content import MinimalContentAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.one_release import OneReleaseAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.package_description_intent import (
PackageDescriptionIntentAnalyzer,
)
from macaron.malware_analyzer.pypi_heuristics.metadata.similar_projects import SimilarProjectAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.source_code_repo import SourceCodeRepoAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.type_stub_file import TypeStubFileAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.typosquatting_presence import TyposquattingPresenceAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release import UnchangedReleaseAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.unsecure_description import UnsecureDescriptionAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.wheel_absence import WheelAbsenceAnalyzer
from macaron.malware_analyzer.pypi_heuristics.sourcecode.pypi_sourcecode_analyzer import PyPISourcecodeAnalyzer
from macaron.malware_analyzer.pypi_heuristics.sourcecode.suspicious_setup import SuspiciousSetupAnalyzer
Expand Down Expand Up @@ -368,8 +370,8 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
TyposquattingPresenceAnalyzer,
FakeEmailAnalyzer,
SimilarProjectAnalyzer,
UnsecureDescriptionAnalyzer,
MinimalContentAnalyzer,
PackageDescriptionIntentAnalyzer,
TypeStubFileAnalyzer,
]

# name used to query the result of all problog rules, so it can be accessed outside the model.
Expand Down Expand Up @@ -419,20 +421,17 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
failed({Heuristics.CLOSER_RELEASE_JOIN_DATE.value}),
forceSetup.

% Package released with a name similar to a popular package.
% Package released recently with little detail, forcing setup.py to run, and suspected of typosquatting.
{Confidence.HIGH.value}::trigger(malware_high_confidence_4) :-
quickUndetailed,
forceSetup,
failed({Heuristics.TYPOSQUATTING_PRESENCE.value}),
failed({Heuristics.STUB_NAME.value}).
failed({Heuristics.TYPOSQUATTING_PRESENCE.value}).

% Package released with dependency confusion .
% Package forces setup.py to run, has a high version number and is not intended to be a stub package.
{Confidence.HIGH.value}::trigger(malware_high_confidence_5) :-
forceSetup,
failed({Heuristics.MINIMAL_CONTENT.value}),
failed({Heuristics.STUB_NAME.value}),
failed({Heuristics.ANOMALOUS_VERSION.value}),
failed({Heuristics.UNSECURE_DESCRIPTION.value}).
failed({Heuristics.ANOMALOUS_VERSION.value}).

% Package released recently with little detail, with multiple releases as a trust marker, but frequent and with
% the same code.
Expand All @@ -442,12 +441,14 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
failed({Heuristics.UNCHANGED_RELEASE.value}),
passed({Heuristics.SUSPICIOUS_SETUP.value}).

% Package released recently with little detail and an anomalous version number for a single-release package.
% Package released recently with little detail and an anomalous version number for a single-release package. The
% package is not intended to be a stub package.
{Confidence.MEDIUM.value}::trigger(malware_medium_confidence_2) :-
quickUndetailed,
failed({Heuristics.ONE_RELEASE.value}),
failed({Heuristics.ANOMALOUS_VERSION.value}),
failed({Heuristics.UNSECURE_DESCRIPTION.value}).
failed({Heuristics.TYPE_STUB_FILE.value}),
failed({Heuristics.PACKAGE_DESCRIPTION_INTENT.value}).

% Package has no links, one release or multiple quick releases, and a suspicious maintainer who recently
% joined, has a fake email address, and other similarly-structured projects.
Expand Down
59 changes: 59 additions & 0 deletions tests/malware_analyzer/pypi/test_package_description_intent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""Tests for the PackageDescriptionIntentAnalyzer heuristic."""

from unittest.mock import MagicMock

import pytest

from macaron.errors import HeuristicAnalyzerValueError
from macaron.malware_analyzer.pypi_heuristics.heuristics import HeuristicResult
from macaron.malware_analyzer.pypi_heuristics.metadata.package_description_intent import (
PackageDescriptionIntentAnalyzer,
)


@pytest.fixture(name="analyzer")
def analyzer_() -> PackageDescriptionIntentAnalyzer:
"""Pytest fixture to create an PackageDescriptionIntentAnalyzer instance."""
return PackageDescriptionIntentAnalyzer()


def test_no_info(analyzer: PackageDescriptionIntentAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer raises an error when no package info is found."""
pypi_package_json.package_json = {}
with pytest.raises(HeuristicAnalyzerValueError):
analyzer.analyze(pypi_package_json)


@pytest.mark.parametrize(
("metadata", "expected_result"),
[
pytest.param(
{"description": "A harmless package to prevent typosquatting attacks"},
HeuristicResult.PASS,
id="test_harmless_package_description",
),
pytest.param(
{"summary": "placeholder package to prevent dependency confusion attacks"},
HeuristicResult.PASS,
id="test_harmless_package_summary",
),
pytest.param(
{"description": "A regular public package", "summary": "does regular things"},
HeuristicResult.FAIL,
id="test_no_intention",
),
],
)
def test_analyze_scenarios(
analyzer: PackageDescriptionIntentAnalyzer,
pypi_package_json: MagicMock,
metadata: dict,
expected_result: HeuristicResult,
) -> None:
"""Test the analyzer with various metadata scenarios."""
pypi_package_json.package_json = {"info": metadata}
result, _ = analyzer.analyze(pypi_package_json)
assert result == expected_result
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""Tests for the MinimalContentAnalyzer heuristic."""
"""Tests for the TypeStubFileAnalyzer heuristic."""

from unittest.mock import MagicMock, patch

import pytest

from macaron.errors import SourceCodeError
from macaron.malware_analyzer.pypi_heuristics.heuristics import HeuristicResult
from macaron.malware_analyzer.pypi_heuristics.metadata.minimal_content import MinimalContentAnalyzer
from macaron.malware_analyzer.pypi_heuristics.metadata.type_stub_file import TypeStubFileAnalyzer


@pytest.fixture(name="analyzer")
def analyzer_() -> MinimalContentAnalyzer:
"""Pytest fixture to create a MinimalContentAnalyzer instance."""
return MinimalContentAnalyzer()
def analyzer_() -> TypeStubFileAnalyzer:
"""Pytest fixture to create a TypeStubFileAnalyzer instance."""
return TypeStubFileAnalyzer()


def test_analyze_sufficient_files_pass(analyzer: MinimalContentAnalyzer, pypi_package_json: MagicMock) -> None:
def test_analyze_sufficient_files_pass(analyzer: TypeStubFileAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer passes when the package has sufficient files."""
pypi_package_json.download_sourcecode.return_value = True
pypi_package_json.package_sourcecode_path = "/fake/path"
Expand All @@ -30,7 +30,7 @@ def test_analyze_sufficient_files_pass(analyzer: MinimalContentAnalyzer, pypi_pa
pypi_package_json.download_sourcecode.assert_called_once()


def test_analyze_exactly_threshold_files_pass(analyzer: MinimalContentAnalyzer, pypi_package_json: MagicMock) -> None:
def test_analyze_exactly_threshold_files_pass(analyzer: TypeStubFileAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer passes when the package has exactly the threshold number of files."""
pypi_package_json.download_sourcecode.return_value = True
pypi_package_json.package_sourcecode_path = "/fake/path"
Expand All @@ -41,7 +41,7 @@ def test_analyze_exactly_threshold_files_pass(analyzer: MinimalContentAnalyzer,
assert result == HeuristicResult.PASS


def test_analyze_insufficient_files_fail(analyzer: MinimalContentAnalyzer, pypi_package_json: MagicMock) -> None:
def test_analyze_insufficient_files_fail(analyzer: TypeStubFileAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer fails when the package has insufficient files."""
pypi_package_json.download_sourcecode.return_value = True
pypi_package_json.package_sourcecode_path = "/fake/path"
Expand All @@ -52,7 +52,7 @@ def test_analyze_insufficient_files_fail(analyzer: MinimalContentAnalyzer, pypi_
assert result == HeuristicResult.FAIL


def test_analyze_no_files_fail(analyzer: MinimalContentAnalyzer, pypi_package_json: MagicMock) -> None:
def test_analyze_no_files_fail(analyzer: TypeStubFileAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer fails when the package has no files."""
pypi_package_json.download_sourcecode.return_value = True
pypi_package_json.package_sourcecode_path = "/fake/path"
Expand All @@ -63,7 +63,7 @@ def test_analyze_no_files_fail(analyzer: MinimalContentAnalyzer, pypi_package_js
assert result == HeuristicResult.FAIL


def test_analyze_download_failed_raises_error(analyzer: MinimalContentAnalyzer, pypi_package_json: MagicMock) -> None:
def test_analyze_download_failed_raises_error(analyzer: TypeStubFileAnalyzer, pypi_package_json: MagicMock) -> None:
"""Test the analyzer raises SourceCodeError when source code download fails."""
pypi_package_json.download_sourcecode.return_value = False

Expand All @@ -84,8 +84,8 @@ def test_analyze_download_failed_raises_error(analyzer: MinimalContentAnalyzer,
(15, HeuristicResult.PASS),
],
)
def test_analyze_various_file_counts(
analyzer: MinimalContentAnalyzer,
def test_analyze_file_counts(
analyzer: TypeStubFileAnalyzer,
pypi_package_json: MagicMock,
file_count: int,
expected_result: HeuristicResult,
Expand Down
Loading
Loading