Skip to content

Wayback upgrade#2909

Draft
liquidsec wants to merge 26 commits into3.0from
wayback-upgrade
Draft

Wayback upgrade#2909
liquidsec wants to merge 26 commits into3.0from
wayback-upgrade

Conversation

@liquidsec
Copy link
Collaborator

TBA

@liquidsec liquidsec marked this pull request as draft February 19, 2026 03:01
assert "archive_url" in finding.data, (
f"Hunt FINDING should have archive_url for provenance, got: {finding.data}"
)
assert "web.archive.org" in finding.data["archive_url"], (

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
web.archive.org
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 10 days ago

In general, the way to fix incomplete URL substring sanitization is to parse the URL using a standard library, extract the hostname, and then compare that hostname (or a suffix of it) to the expected allowed host, instead of checking for a substring in the raw URL string.

In this specific case, we should change the assertion that currently does assert "web.archive.org" in finding.data["archive_url"] so that it parses archive_url with urllib.parse.urlparse, extracts .hostname, and asserts that the hostname is exactly web.archive.org. This preserves the intended functionality (“archive_url should be archive.org URL”) while avoiding arbitrary substring matches. Concretely, within TestWaybackParameters.check, around lines 309–315, we will introduce a local variable such as archive_url_host = urlparse(finding.data["archive_url"]).hostname and assert archive_url_host == "web.archive.org". To do this, we must import urlparse from urllib.parse at the top of the test file, alongside the existing unquote import. No other behavior in the tests needs to change.

Suggested changeset 1
bbot/test/test_step_2/module_tests/test_module_wayback.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bbot/test/test_step_2/module_tests/test_module_wayback.py b/bbot/test/test_step_2/module_tests/test_module_wayback.py
--- a/bbot/test/test_step_2/module_tests/test_module_wayback.py
+++ b/bbot/test/test_step_2/module_tests/test_module_wayback.py
@@ -1,5 +1,5 @@
 import re
-from urllib.parse import unquote
+from urllib.parse import unquote, urlparse
 
 from werkzeug.wrappers import Response
 
@@ -310,8 +310,10 @@
             assert "archive_url" in finding.data, (
                 f"Hunt FINDING should have archive_url for provenance, got: {finding.data}"
             )
-            assert "web.archive.org" in finding.data["archive_url"], (
-                f"Hunt FINDING archive_url should be archive.org URL, got: {finding.data['archive_url']}"
+            archive_url_host = urlparse(finding.data["archive_url"]).hostname
+            assert archive_url_host == "web.archive.org", (
+                f"Hunt FINDING archive_url should be archive.org URL, got host: {archive_url_host}, "
+                f"full URL: {finding.data['archive_url']}"
             )
 
         # WEB_PARAMETERs from archived content should also have archive_url
EOF
@@ -1,5 +1,5 @@
import re
from urllib.parse import unquote
from urllib.parse import unquote, urlparse

from werkzeug.wrappers import Response

@@ -310,8 +310,10 @@
assert "archive_url" in finding.data, (
f"Hunt FINDING should have archive_url for provenance, got: {finding.data}"
)
assert "web.archive.org" in finding.data["archive_url"], (
f"Hunt FINDING archive_url should be archive.org URL, got: {finding.data['archive_url']}"
archive_url_host = urlparse(finding.data["archive_url"]).hostname
assert archive_url_host == "web.archive.org", (
f"Hunt FINDING archive_url should be archive.org URL, got host: {archive_url_host}, "
f"full URL: {finding.data['archive_url']}"
)

# WEB_PARAMETERs from archived content should also have archive_url
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bro its a draft step off

@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs wayback-upgrade (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 3.79ms 3.81ms +0.7%
Bloom Filter Large Scale Dns Brute Force 16.80ms 16.97ms +1.0%
Large Closest Match Lookup 318.71ms 309.47ms -2.9%
Realistic Closest Match Workload 171.60ms 170.95ms -0.4%
Event Validation Full Scan Startup Small Batch 461.24ms 458.09ms -0.7%
Event Validation Full Scan Startup Large Batch 821.50ms 811.37ms -1.2%
Make Event Autodetection Small 26.35ms 26.03ms -1.2%
Make Event Autodetection Large 265.94ms 268.10ms +0.8%
Make Event Explicit Types 11.44ms 11.54ms +0.9%
Excavate Single Thread Small 3.462s 3.460s -0.1%
Excavate Single Thread Large 9.438s 9.566s +1.4%
Excavate Parallel Tasks Small 3.634s 3.657s +0.6%
Excavate Parallel Tasks Large 7.022s 7.115s +1.3%
Is Ip Performance 2.93ms 2.97ms +1.3%
Make Ip Type Performance 10.78ms 10.87ms +0.8%
Mixed Ip Operations 4.21ms 4.25ms +0.8%
Typical Queue Shuffle 54.61µs 54.80µs +0.3%
Priority Queue Shuffle 613.46µs 593.13µs -3.3%

🎯 Performance Summary

No significant performance changes detected (all changes <10%)


🐍 Python Version 3.11.14

@liquidsec liquidsec changed the base branch from dev to 3.0 February 28, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant