Skip to content

Add wgetlua plugin for Archive Team wget-lua (wget-at)#19

Open
claude[bot] wants to merge 1 commit intomainfrom
add-wgetlua-plugin
Open

Add wgetlua plugin for Archive Team wget-lua (wget-at)#19
claude[bot] wants to merge 1 commit intomainfrom
add-wgetlua-plugin

Conversation

@claude
Copy link
Copy Markdown

@claude claude bot commented Mar 26, 2026

Summary

  • Adds new wgetlua plugin that archives pages using wget-at (Archive Team wget-lua) for better WARC compliance and archive.org compatibility
  • Uses binprovider overrides in config.json to install wget-at via brew or build from source via custom provider
  • Includes 10 live integration tests that hit real https://example.com and verify HTML content + WARC output correctness (no mocking)

Test plan

  • uv run ruff check passes
  • uv run pyright passes (0 errors)
  • All 10 pytest tests pass including live integration tests
  • abx-dl --plugins=wgetlua --output=/tmp/test 'https://example.com' produces correct HTML and WARC output
  • abx-dl install wgetlua correctly resolves wget-at binary via env/brew/custom providers

Closes #17

🤖 Generated with Claude Code


Summary by cubic

Adds a new wgetlua plugin that archives pages with wget-at (Archive Team wget-lua) and writes WARCs compatible with archive.org. Includes provider-backed installation and live tests that verify HTML content and WARC output.

  • New Features
    • Archive URLs using wget-at; outputs files under wgetlua/ and WARCs under wgetlua/warc/.
    • Resolves the wget-at binary via env, brew (brew install wget-at), or a custom source build override in config.json.
    • Config options supported: WGETLUA_ENABLED, WGETLUA_WARC_ENABLED, WGETLUA_BINARY, WGETLUA_TIMEOUT, WGETLUA_USER_AGENT, WGETLUA_COOKIES_FILE, WGETLUA_CHECK_SSL_VALIDITY, WGETLUA_ARGS, WGETLUA_ARGS_EXTRA.
    • Emits ArchiveResult records and skips when staticfile already handled the URL; includes simple card and icon templates.
    • Adds 10 live integration tests against https://example.com to confirm HTML capture and WARC validity.

Written for commit 6758b9e. Summary will update on new commits.

New plugin that archives pages using wget-at for better WARC compliance
and archive.org compatibility. Uses binprovider overrides in config.json
to install wget-at via brew or build from source. Includes live
integration tests against https://example.com that verify HTML content
and WARC output correctness.

Closes #17

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="abx_plugins/plugins/wgetlua/tests/test_wgetlua.py">

<violation number="1" location="abx_plugins/plugins/wgetlua/tests/test_wgetlua.py:428">
P1: Custom agent: **Test quality checker**

Remove `pytest.skip(...)` paths for missing `wget-at`; this violates the rule clause forbidding skipped/bail-early tests.</violation>

<violation number="2" location="abx_plugins/plugins/wgetlua/tests/test_wgetlua.py:451">
P1: Custom agent: **Test quality checker**

`test_config_timeout_honored` is a fake assertion: it accepts both success and failure and does not verify that the timeout config is actually enforced.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

)

# Verify it completed (success or fail, but didn't hang)
assert result.returncode in (0, 1), "Should complete (success or fail)"
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Test quality checker

test_config_timeout_honored is a fake assertion: it accepts both success and failure and does not verify that the timeout config is actually enforced.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At abx_plugins/plugins/wgetlua/tests/test_wgetlua.py, line 451:

<comment>`test_config_timeout_honored` is a fake assertion: it accepts both success and failure and does not verify that the timeout config is actually enforced.</comment>

<file context>
@@ -0,0 +1,490 @@
+        )
+
+        # Verify it completed (success or fail, but didn't hang)
+        assert result.returncode in (0, 1), "Should complete (success or fail)"
+
+
</file context>
Fix with Cubic


wget_at_path = _ensure_wget_at_installed()
if not wget_at_path:
pytest.skip("wget-at not available")
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Test quality checker

Remove pytest.skip(...) paths for missing wget-at; this violates the rule clause forbidding skipped/bail-early tests.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At abx_plugins/plugins/wgetlua/tests/test_wgetlua.py, line 428:

<comment>Remove `pytest.skip(...)` paths for missing `wget-at`; this violates the rule clause forbidding skipped/bail-early tests.</comment>

<file context>
@@ -0,0 +1,490 @@
+
+    wget_at_path = _ensure_wget_at_installed()
+    if not wget_at_path:
+        pytest.skip("wget-at not available")
+
+    with tempfile.TemporaryDirectory() as tmpdir:
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch wget for wget-at for better WARC compliance

0 participants