Skip to content

Commit f781e22

Browse files
committed
Prioritize hashes and download URL for PurlDB mapping
In order to get an accurate mapping for a package in DejaCode to PurlDB entries the patched query prioritizes the hashes. This is needed in cases where the same PURL (without query parameters) can have multiple different download URLs as is the case with Python packages and various binaries for different hardware architectures or interpreter versions. Additionally, lookups for SHA-256 and MD5 are added as SHA-1 may not be populated under all circumstances. Hashes from SBOM imports, generated by tools such as cdxgen, commonly do not use SHA-1 anymore, since it is a mostly deprecated hashing algorithm due to the risk of hash collisions. SHA-512 could not yet be added as PurlDB does not support a lookup for it. The reason for the order of prioritization is that hashes give the most accurate for the content of the package, download URL at least points to the download location which would still allow to differentiate between the different target architectures, and lastly the PURL itself in case no fully accurate matches could be found otherwise. The results are then filtered by checking that PURLs match. Here a modification is made to also strip the query parameters from the PurlDB PURL as they may also contain them and previously caused matches to not be found. For reference see the following issues: #307 #383 Signed-off-by: Robert Guetzkow <[email protected]>
1 parent a3435be commit f781e22

File tree

1 file changed

+29
-4
lines changed

1 file changed

+29
-4
lines changed

component_catalog/models.py

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2506,13 +2506,34 @@ def create_from_url(cls, url, user):
25062506
if download_url and not purldb_data:
25072507
package_data = collect_package_data(download_url)
25082508

2509+
if sha512 := package_data.get("sha512"):
2510+
if sha512_match := scoped_packages_qs.filter(sha512=sha512):
2511+
package_link = sha512_match[0].get_absolute_link()
2512+
raise PackageAlreadyExistsWarning(
2513+
f"{url} already exists in your Dataspace as {package_link}"
2514+
)
2515+
2516+
if sha256 := package_data.get("sha256"):
2517+
if sha256_match := scoped_packages_qs.filter(sha256=sha256):
2518+
package_link = sha256_match[0].get_absolute_link()
2519+
raise PackageAlreadyExistsWarning(
2520+
f"{url} already exists in your Dataspace as {package_link}"
2521+
)
2522+
25092523
if sha1 := package_data.get("sha1"):
25102524
if sha1_match := scoped_packages_qs.filter(sha1=sha1):
25112525
package_link = sha1_match[0].get_absolute_link()
25122526
raise PackageAlreadyExistsWarning(
25132527
f"{url} already exists in your Dataspace as {package_link}"
25142528
)
25152529

2530+
if md5 := package_data.get("md5"):
2531+
if md5_match := scoped_packages_qs.filter(md5=md5):
2532+
package_link = md5_match[0].get_absolute_link()
2533+
raise PackageAlreadyExistsWarning(
2534+
f"{url} already exists in your Dataspace as {package_link}"
2535+
)
2536+
25162537
# Duplicate the declared_license_expression into the license_expression field.
25172538
if declared_license_expression := package_data.get("declared_license_expression"):
25182539
package_data["license_expression"] = declared_license_expression
@@ -2528,9 +2549,9 @@ def get_purldb_entries(self, user, max_request_call=0, timeout=10):
25282549
Return the PurlDB entries that correspond to this Package instance.
25292550
25302551
Matching on the following fields order:
2531-
- Package URL
2532-
- SHA1
2552+
- Hash
25332553
- Download URL
2554+
- Package URL
25342555
25352556
A `max_request_call` integer can be provided to limit the number of
25362557
HTTP requests made to the PackageURL server.
@@ -2542,12 +2563,16 @@ def get_purldb_entries(self, user, max_request_call=0, timeout=10):
25422563
purldb_entries = []
25432564

25442565
package_url = self.package_url
2545-
if package_url:
2546-
payloads.append({"purl": package_url})
2566+
if self.sha256:
2567+
payloads.append({"sha256": self.sha256})
25472568
if self.sha1:
25482569
payloads.append({"sha1": self.sha1})
2570+
if self.md5:
2571+
payloads.append({"md5": self.md5})
25492572
if self.download_url:
25502573
payloads.append({"download_url": self.download_url})
2574+
if package_url:
2575+
payloads.append({"purl": package_url})
25512576

25522577
purldb = PurlDB(user.dataspace)
25532578
for index, payload in enumerate(payloads):

0 commit comments

Comments
 (0)