Skip to content

Problems downloading large files (time-out)Β #2

@cerebrate

Description

@cerebrate

This seems to happen with particularly large files (440+ pages, all image-based):

❯ python scribd-downloader.py
Input link Scribd: https://www.scribd.com/document/[REDACTED]
Link embed: https://www.scribd.com/embeds/[REDACTED]/content
Output filename: [REDACTED].pdf

πŸš€ Starting headless Chrome browser...
βœ… Cookie dialogs hidden
πŸ“„ Found 433 pages, scrolling...
   Scrolled 10/433 pages...
   Scrolled 20/433 pages...
   Scrolled 30/433 pages...
   Scrolled 40/433 pages...
   Scrolled 50/433 pages...
   Scrolled 60/433 pages...
   Scrolled 70/433 pages...
   Scrolled 80/433 pages...
   Scrolled 90/433 pages...
   Scrolled 100/433 pages...
   Scrolled 110/433 pages...
   Scrolled 120/433 pages...
   Scrolled 130/433 pages...
   Scrolled 140/433 pages...
   Scrolled 150/433 pages...
   Scrolled 160/433 pages...
   Scrolled 170/433 pages...
   Scrolled 180/433 pages...
   Scrolled 190/433 pages...
   Scrolled 200/433 pages...
   Scrolled 210/433 pages...
   Scrolled 220/433 pages...
   Scrolled 230/433 pages...
   Scrolled 240/433 pages...
   Scrolled 250/433 pages...
   Scrolled 260/433 pages...
   Scrolled 270/433 pages...
   Scrolled 280/433 pages...
   Scrolled 290/433 pages...
   Scrolled 300/433 pages...
   Scrolled 310/433 pages...
   Scrolled 320/433 pages...
   Scrolled 330/433 pages...
   Scrolled 340/433 pages...
   Scrolled 350/433 pages...
   Scrolled 360/433 pages...
   Scrolled 370/433 pages...
   Scrolled 380/433 pages...
   Scrolled 390/433 pages...
   Scrolled 400/433 pages...
   Scrolled 410/433 pages...
   Scrolled 420/433 pages...
   Scrolled 430/433 pages...
βœ… All 433 pages loaded
βœ… Top toolbar removed
βœ… Bottom toolbar removed
βœ… Cleaned 1 scroll containers
βœ… Print CSS injected

πŸ“₯ Saving PDF as: [REDACTED].pdf
   Page size: Executive (7.25" x 10.5")
   Margins: None
   Headers/Footers: Disabled
❌ Error saving PDF: HTTPConnectionPool(host='localhost', port=58181): Read timed out. (read timeout=120)
⚠️ Auto-save failed. Opening print dialog as fallback...
Traceback (most recent call last):
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connection.py", line 571, in getresponse
    httplib_response = super().getresponse()
  File "/usr/lib/python3.13/http/client.py", line 1450, in getresponse
    response.begin()
    ~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/http/client.py", line 336, in begin
    version, status, reason = self._read_status()
                              ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/http/client.py", line 297, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
           ~~~~~~~~~~~~~~~~~~~~^^^
TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/wrk/avatar/src/scribd-downloader/scribd-downloader.py", line 448, in <module>
    driver.execute_script("window.print();")
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/selenium/webdriver/remote/webdriver.py", line 518, in execute_script
    return self.execute(command, {"script": script, "args": converted_args})["value"]
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute
    response = cast(RemoteConnection, self.command_executor).execute(driver_command, params)
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/selenium/webdriver/remote/remote_connection.py", line 406, in execute
    return self._request(command_info[0], url, body=data)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/selenium/webdriver/remote/remote_connection.py", line 430, in _request
    response = self._conn.request(method, url, body=body, headers=headers, timeout=self._client_config.timeout)
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/_request_methods.py", line 143, in request
    return self.request_encode_body(
           ~~~~~~~~~~~~~~~~~~~~~~~~^
        method, url, fields=fields, headers=headers, **urlopen_kw
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/_request_methods.py", line 278, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/poolmanager.py", line 457, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
        method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    )
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/util/retry.py", line 490, in increment
    raise reraise(type(error), error, _stacktrace)
          ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
        conn,
    ...<10 lines>...
        **response_kw,
    )
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avatar/src/scribd-downloader/lib/python3.13/site-packages/urllib3/connectionpool.py", line 367, in _raise_timeout
    raise ReadTimeoutError(
        self, url, f"Read timed out. (read timeout={timeout_value})"
    ) from err
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=58181): Read timed out. (read timeout=120)
[1]    312770 exit 1     python scribd-downloader.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions