Skip to content

fix: send browser user-agent for HTTP URI conversions#1849

Open
jaythehardcoder wants to merge 1 commit intomicrosoft:mainfrom
jaythehardcoder:fix/http-user-agent-header
Open

fix: send browser user-agent for HTTP URI conversions#1849
jaythehardcoder wants to merge 1 commit intomicrosoft:mainfrom
jaythehardcoder:fix/http-user-agent-header

Conversation

@jaythehardcoder
Copy link
Copy Markdown

Summary

  • send a browser-like User-Agent header for HTTP/HTTPS URI conversions
  • keep the existing streaming download flow unchanged
  • add a regression test that verifies the request header is set

Problem

Issue #1467 reports URL conversions failing against endpoints that reject requests without a browser-style User-Agent, even though the same URLs work in a normal browser.

Fix

MarkItDown now includes a browser-like User-Agent header on its HTTP fetch path before passing the response through the normal conversion flow.

Test Plan

  • python3 -m pytest -q packages/markitdown/tests/test_module_misc.py -k http_uri_uses_browser_user_agent
  • python3 -m pytest -q packages/markitdown/tests/test_module_misc.py

Closes #1467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add browser User-Agent header to HTTP requests to support bot-protected APIs

2 participants