Skip to content

Commit a9a1ecf

Browse files
authored
chore: improve max download restrictions for malicious metadata tutorial (#1188)
Maximum file download size is now stored in the [downloads] section. UNKNOWN is no longer returned when the source code is larger than the file limit. Signed-off-by: Carl Flottmann <[email protected]>
1 parent 27f3cdd commit a9a1ecf

File tree

13 files changed

+86
-16
lines changed

13 files changed

+86
-16
lines changed

docs/source/pages/tutorials/detect_malicious_package.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,8 @@ By default, the source code analyzer is run in conjunction with the other metada
136136
137137
./run_macaron.sh analyze -purl pkg:pypi/[email protected] --python-venv "/tmp/.django_venv" --force-analyze-source
138138
139+
.. note:: Some packages source code, like ``[email protected]``, will be larger than the default download limit of 10 megabytes. This is controlled using the ``max_download_size`` configuration under ``downloads`` in ``defaults.ini``, and can be increased by either modifying that value in ``defaults.ini`` or by passing in a configuration file using ``-dp`` with this value increased.
140+
139141
If any suspicious patterns are triggered, this will be identified in the ``mcn_detect_malicious_metadata_1`` result for the heuristic named ``suspicious_patterns``. The output database ``output/macaron.db`` can be used to get the specific results of the analysis by querying the :class:`detect_malicious_metadata_check.result field <macaron.database>`. This will provide detailed JSON information about all data collected by the ``mcn_detect_malicious_metadata_1`` check, including, for source code analysis, any malicious code patterns detected, what Semgrep rule detected it, the file in which it was detected, and the line number for the detection.
140142

141143
+++++++++++++++++++++++++++++++++++++++

docs/source/pages/tutorials/provenance.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ Build Types
204204
File Download Limit
205205
*******************
206206

207-
To prevent analyses from taking too long, Macaron imposes a configurable size limit for downloads. This includes files being downloaded for provenance verification. In cases where the limit is being reached and you wish to continue analysis regardless, you can specify a new download size in the default configuration file. This value can be found under the ``slsa.verifier`` section, listed as ``max_download_size`` with a default limit of 10 megabytes. See :ref:`How to change the default configuration <change-config>` for more details on configuring values like these.
207+
To prevent analyses from taking too long, Macaron imposes a configurable size limit for downloads. This includes files being downloaded for provenance verification. In cases where the limit is being reached and you wish to continue analysis regardless, you can specify a new download size in the default configuration file. This value can be found under the ``downloads`` section, listed as ``max_download_size`` with a default limit of 10 megabytes. See :ref:`How to change the default configuration <change-config>` for more details on configuring values like these.
208208

209209
**************************************
210210
Run ``verify-policy`` command (semver)

src/macaron/config/defaults.ini

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ error_retries = 5
1010
[downloads]
1111
# The default timeout in seconds for downloading assets.
1212
timeout = 120
13+
# This is the acceptable maximum size (in bytes) to download an asset.
14+
max_download_size = 10000000
1315

1416
# This is the database to store Macaron's results.
1517
[database]
@@ -486,8 +488,6 @@ provenance_extensions =
486488
intoto.jsonl.gz
487489
intoto.jsonl.url
488490
intoto.jsonl.gz.url
489-
# This is the acceptable maximum size (in bytes) to download an asset.
490-
max_download_size = 10000000
491491
# This is the timeout (in seconds) to run the SLSA verifier.
492492
timeout = 120
493493
# The allowed hostnames for URL file links for provenance download

src/macaron/malware_analyzer/pypi_heuristics/sourcecode/suspicious_setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ def _get_setup_source_code(self, pypi_package_json: PyPIPackageJsonAsset) -> str
5656
with tempfile.TemporaryDirectory() as temp_dir:
5757
source_file = os.path.join(temp_dir, file_name)
5858
timeout = defaults.getint("downloads", "timeout", fallback=120)
59-
size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
59+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
6060
if not download_file_with_size_limit(sourcecode_url, {}, source_file, timeout, size_limit):
6161
return None
6262

src/macaron/provenance/provenance_finder.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ def find_gav_provenance(purl: PackageURL, registry: JFrogMavenRegistry) -> list[
255255
return []
256256

257257
max_valid_provenance_size = defaults.getint(
258-
"slsa.verifier",
258+
"downloads",
259259
"max_download_size",
260260
fallback=1000000,
261261
)
@@ -458,7 +458,7 @@ def download_provenances_from_ci_service(ci_info: CIInfo, download_path: str) ->
458458
for prov_asset in prov_assets:
459459
# Check the size before downloading.
460460
if prov_asset.size_in_bytes > defaults.getint(
461-
"slsa.verifier",
461+
"downloads",
462462
"max_download_size",
463463
fallback=1000000,
464464
):

src/macaron/provenance/provenance_verifier.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def verify_ci_provenance(analyze_ctx: AnalyzeContext, ci_info: CIInfo, download_
190190
return False
191191
if not Path(download_path, sub_asset["name"]).is_file():
192192
if "size" in sub_asset and sub_asset["size"] > defaults.getint(
193-
"slsa.verifier", "max_download_size", fallback=1000000
193+
"downloads", "max_download_size", fallback=1000000
194194
):
195195
logger.debug("Sub asset too large to verify: %s", sub_asset["name"])
196196
return False

src/macaron/slsa_analyzer/checks/detect_malicious_metadata_check.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,10 @@ def analyze_source(
148148
if not force and analyzer.depends_on and self._should_skip(results, analyzer.depends_on):
149149
return {analyzer.heuristic: HeuristicResult.SKIP}, {}
150150

151+
if not pypi_package_json.can_download_sourcecode():
152+
logger.debug("Source code will exceed download limits. Please increase the download size limit to analyze.")
153+
return {analyzer.heuristic: HeuristicResult.SKIP}, {}
154+
151155
try:
152156
with pypi_package_json.sourcecode():
153157
result, detail_info = analyzer.analyze(pypi_package_json)

src/macaron/slsa_analyzer/git_service/api_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,7 @@ def download_asset(self, url: str, download_path: str) -> bool:
643643
logger.debug("Download assets from %s at %s.", url, download_path)
644644

645645
timeout = defaults.getint("downloads", "timeout", fallback=120)
646-
size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
646+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
647647
headers = {"Accept": "application/octet-stream", "Authorization": self.headers["Authorization"]}
648648

649649
return download_file_with_size_limit(url, headers, download_path, timeout, size_limit)

src/macaron/slsa_analyzer/package_registry/maven_central_registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def get_artifact_hash(self, purl: PackageURL) -> str | None:
286286

287287
hash_algorithm = hashlib.sha256()
288288
timeout = defaults.getint("downloads", "timeout", fallback=120)
289-
size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
289+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
290290
if not stream_file_with_size_limit(artifact_url, {}, hash_algorithm.update, timeout, size_limit):
291291
return None
292292

src/macaron/slsa_analyzer/package_registry/pypi_registry.py

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,12 @@
2626
from macaron.json_tools import json_extract
2727
from macaron.malware_analyzer.datetime_parser import parse_datetime
2828
from macaron.slsa_analyzer.package_registry.package_registry import PackageRegistry
29-
from macaron.util import download_file_with_size_limit, send_get_http_raw, stream_file_with_size_limit
29+
from macaron.util import (
30+
can_download_file,
31+
download_file_with_size_limit,
32+
send_get_http_raw,
33+
stream_file_with_size_limit,
34+
)
3035

3136
if TYPE_CHECKING:
3237
from macaron.slsa_analyzer.specs.package_registry_spec import PackageRegistryInfo
@@ -209,6 +214,23 @@ def cleanup_sourcecode_directory(
209214
raise InvalidHTTPResponseError(error_message) from error
210215
raise InvalidHTTPResponseError(error_message)
211216

217+
def can_download_package_sourcecode(self, url: str) -> bool:
218+
"""Check if the package source code can be downloaded within the default file limits.
219+
220+
Parameters
221+
----------
222+
url: str
223+
The package source code url.
224+
225+
Returns
226+
-------
227+
bool
228+
True if it can be downloaded within the size limits, otherwise False.
229+
"""
230+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
231+
timeout = defaults.getint("downloads", "timeout", fallback=120)
232+
return can_download_file(url, size_limit, timeout=timeout)
233+
212234
def download_package_sourcecode(self, url: str) -> str:
213235
"""Download the package source code from pypi registry.
214236
@@ -235,7 +257,7 @@ def download_package_sourcecode(self, url: str) -> str:
235257
temp_dir = tempfile.mkdtemp(prefix=f"{package_name}_")
236258
source_file = os.path.join(temp_dir, file_name)
237259
timeout = defaults.getint("downloads", "timeout", fallback=120)
238-
size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
260+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
239261
if not download_file_with_size_limit(url, {}, source_file, timeout, size_limit):
240262
self.cleanup_sourcecode_directory(temp_dir, "Could not download the file.")
241263

@@ -273,7 +295,7 @@ def get_artifact_hash(self, artifact_url: str) -> str | None:
273295
"""
274296
hash_algorithm = hashlib.sha256()
275297
timeout = defaults.getint("downloads", "timeout", fallback=120)
276-
size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
298+
size_limit = defaults.getint("downloads", "max_download_size", fallback=10000000)
277299
if not stream_file_with_size_limit(artifact_url, {}, hash_algorithm.update, timeout, size_limit):
278300
return None
279301

@@ -624,6 +646,19 @@ def download_sourcecode(self) -> bool:
624646
logger.debug(error)
625647
return False
626648

649+
def can_download_sourcecode(self) -> bool:
650+
"""Return whether the package source code can be downloaded within the download file size limits.
651+
652+
Returns
653+
-------
654+
bool
655+
``True`` if the source code can be downloaded; ``False`` if not.
656+
"""
657+
url = self.get_sourcecode_url()
658+
if url:
659+
return self.pypi_registry.can_download_package_sourcecode(url)
660+
return False
661+
627662
def get_sourcecode_file_contents(self, path: str) -> bytes:
628663
"""
629664
Get the contents of a single source code file specified by the path.

0 commit comments

Comments
 (0)