Skip to content

[πŸ› Bug]: OSError Text file busy on .cache/seleniumΒ #13511

@anteph

Description

@anteph

What happened?

Hello!

I'm using Selenium with Chrome driver to download some web pages. I'm running the browser in headless mode.
This piece of code is very basic and roughly like:

from selenium import webdriver

...
driver = webdriver.Chrome()

driver.get("...some url")

driver.quit()

So, nothing special about it I believe.

The bit of the application that uses Selenium is running on Celery. The Celery worker is using a fork model, meaning that I have couple of processes (not threads) in each worker.

Sporadically, I've been seeing an OSError: text file busy on ~/.cache/selenium/chromedriver/linux64/121.0.6167.85/chromedriver.

The issue always occurs on the instantiation of the driver:
driver = webdriver.Chrome()

A couple of observations:

  • The selenium import is happening prior to the Celery forking to spawn the instances of the worker. However, I've been through the selenium module and I didn't see any particular operation that could be done on the import that could cause issues. In other words, it seems very fork safe.
  • Every time the logic do download a page runs, I'm instantiating a new webdriver object. This means that no driver object is shared amongst processes, nor amongst different executions of the application logic (I've read here that Selenium is not thread safe but it is ok to generate different driver instances to different threads, so I've assumed it was safe to also do so for processes).
  • This issue happens usually under load and I believe when a new pod is spawned. I think it usually happens once in the lifetime of a given pod, never repeating the issue on that pod again.

Given this seems to happen only once when a new pod launches, under load, and never repeats again, and that it happens on the instantiation of the driver, my suspicion is that there is some race condition on the Selenium Manager. As far as I've understood, Selenium Manager is the piece that is responsible for checking if the Chrome Driver .exe is already in the cache and if not, copy it to there. Maybe two processes are looking to the folder at the same time, see no .exe, and both try to copy it to there, causing the issue?

I've also seen this issue recently when running some of my unit tests on the same logic from above (although in this case I'm testing directly the function that downloads the pages, not the celery worker). But again, seems to be some parallelization related issue.

I've seen a couple of closed issues which mention a similar text file busy error, but they are closed by now and I think they are not exactly the same use case, but I apologize if I misunderstood and this is related.

Thank you :)

How can we reproduce the issue?

I cannot reproduce it myself, it happens very sporadically as described above, and usually under load.

Relevant log output

I'm not collecting any logs, but I think the exception mentioned above is more relevant.

Operating System

Linux - Debian 11 (running in Docker)

Selenium version

Python - 4.17.2

What are the browser(s) and version(s) where you see this issue?

Chrome 121.0.6167.85

What are the browser driver(s) and version(s) where you see this issue?

Chrome Driver 121.0.6167.85

Are you using Selenium Grid?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-rustRust code is mostly Selenium ManagerI-defectSomething is not working as intended

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions