Skip to content

Conversation

@szajbus
Copy link
Contributor

@szajbus szajbus commented Aug 21, 2025

Problem

When multiple concurrent processes read the same S3 file (or HTTP URL), they would all use the same deterministic temporary filename based on the S3 object hash (or HTTP URL). This caused race conditions where one process could delete the temp file after reading it while another was still trying to read it, resulting in "No such file or directory" errors.

See below for a script to reproduce.

Solution

Modified Explorer.PolarsBackend.Shared.build_path_for_entry/1 to append a random suffix to the temporary filename, ensuring each download gets a unique path.

Problem reproduce script

Mix.install([
  {:explorer, "~> 0.11.0"}
])

defmodule BugRepro do
  def run do
    # Required env:
    #   S3_CSV_URL (e.g. s3://my-bucket/path/to.csv)
    #   AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
    # Optional:
    #   AWS_REGION (default: us-east-1), S3_ENDPOINT (default: https://s3.amazonaws.com)
    #   N (default: 20 concurrent tasks)
    s3_url = System.fetch_env!("S3_CSV_URL")

    config = [
      access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
      secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
      region: System.get_env("AWS_REGION") || "us-east-1",
      endpoint: System.get_env("S3_ENDPOINT") || "https://s3.amazonaws.com"
    ]

    n = String.to_integer(System.get_env("N", "20"))
    parent = self()

    tasks =
      for i <- 1..n do
        Task.async(fn ->
          send(parent, {:ready, self(), i})
          receive do :go -> :ok end
          {i, Explorer.DataFrame.from_csv(s3_url, config: config)}
        end)
      end

    for _ <- 1..n do
      receive do {:ready, _pid, _i} -> :ok end
    end

    Enum.each(tasks, fn %Task{pid: pid} -> send(pid, :go) end)

    tasks
    |> Enum.map(&Task.await(&1, :infinity))
    |> Enum.each(fn {i, res} -> IO.inspect(res, label: "task #{i}") end)
  end
end

BugRepro.run()

@josevalim josevalim merged commit 0f915d1 into elixir-explorer:main Aug 21, 2025
3 checks passed
@josevalim
Copy link
Contributor

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants