Add API for listening to concurrent mix compilations #13896

jonatanklosko · 2024-10-11T14:21:38Z

This introduces Mix listeners API. A project may specify listeners as follows:

def project do
  [
    ...,
    listeners: [SomeDep.MixListener]
  ]
end

or

config :mix, :listeners, [SomeDep.MixListener]

where each listener is a child spec. A listener is started on Mix startup (after deps.loadpaths) and receives events as messages. Currently the only message is {:modules_compiled, info} and info.module_diff specifies which modules have been added/changed/removed. Also, for now we have a restriction such that listeners must come from dependencies or be already in the path (since we start listeners right after deps.loadpaths).

One immediate use case that this PR implements is IEx. Up to this point, IEx didn't know when mix compile was called separately, so a subsequent recompile inside IEx wouldn't reflect the newly compiled changes. The new IEx.MixListener listens to compilations and accumulates modules to be purged. Then, on recompile we purge these modules before invoking compilation. In the future we can have a mode where the modules are reloaded immediately.

Further use cases include phoenix_live_reload and the language server that can be made aware of external compilations.

Demo

compilenotify.mp4

Implementation

I added Mix.Sync.PubSub with a pub/sub based on TCP and directory listing (similar concepts to the lock implementation). I also moved the lock to Mix.Sync.Lock.

Then, we now have Mix.PubSub supervisor where we start a single subscriber process and a child supervisor with all the listeners. Whenever the subscriber receives a messages, it sends it to all listeners.

lib/mix/test/mix/tasks/compile_test.exs

lib/mix/lib/mix/pubsub.ex

lib/mix/lib/mix/sync/pubsub.ex

lib/mix/lib/mix/pubsub/subscriber.ex

josevalim · 2024-10-12T09:30:13Z

lib/mix/lib/mix/tasks/compile.all.ex

      do_run(config, args)
-    end)
+    else
+      Mix.Project.with_build_lock(config, fn ->


This is an interesting change. Although do_run above does not compile any application, it still tries to load them. So there is a chance that we try to load the applications while someone is compiling, and therefore erasing modules and .app files. Or someone could be running clean. I wonder if we should just always keep the lock? If someone cleans, it will fail anyway, but at least it will do so more consistently and not load an intermediate state? (and then I also wonder if we should have a deps lock while loading deps apps).

Good point. I saw the lock message when starting a process with --no-compile in tests and I thought there's no need. I will revert.

and then I also wonder if we should have a deps lock while loading deps apps

The loading relies only on build, no? And clean already obtains build lock when deleting deps files.

josevalim · 2024-10-12T09:31:02Z

lib/mix/lib/mix/tasks/compile.elixir.ex

+          )
+
+        if modules_diff do
+          Mix.Task.Compiler.notify_modules_compiled(modules_diff)


I was going to suggest moving this call inside Mix.Compilers.Elixir but, to be honest, it is all the same. We will actually know which way is better once we do the Erlang one, because that may force us to go one way or the other. So let's keep this as is, but we may be forced to move it inside Mix.Compilers.* in a follow up PR.

For both compilers we want to send the same event, so it's effectively more generic no?

Yes, it is the same event, but I believe Mix.Compilers.Erlang may be used by other projects, so we cannot change its return type. :(

Ohh I see, I though you mean where notify_modules_compiled is defined, but it's about where we call it. I though here is a better place and both Mix.Compilers.{Elixir,Erlang} are marked as private modules, but if they are effectively public, then yeah, we can make the broadcast there also.

The Erlang one, last time I checked, was used by third party tools. The Elixir one isn't. This is why the Erlang one will be deciding factor (for consistency), but we can do it in another PR. :)

josevalim · 2024-10-12T09:41:33Z

lib/mix/lib/mix/compilers/elixir.ex

+      changed: Map.keys(Map.intersect(compiled_modules, all_modules)),
+      removed: (all_modules_keys -- compiled_modules_keys) -- pending_modules_keys,
+      timestamp: timestamp
+    }


I am worried about this code being expensive for really large applications (20k modules). In such cases, we are changing 1-100 files, but we now need to traverse 20k to compute these statistics.

We could simplify the stats: instead of sending added/changed/removed, we just send "changed", noting that it implies added/removed/changed, and the subscriber can then lookup in disk the actual status. For the language server case, this means we may look up some modules that have been removed, but that's fine. They are the minority in most cases. For IEx/Phoenix, they don't care, they will just purge all.

If we don't want a single field, we can also do with changed/removed. But I think a single field would totally work on this case. WDYT?

I've just pushed an update to never traverse all the modules. If you still prefer to have a single field, let me know :)

This looks fantastic, I believe we can ship it. Something we can also try to do is to pass a function to PubSub. If someone is listening, we compute the event and send it. If nobody is listening, we don't. WDYT?

If we do that, the function would close over all_modules and we would end up copying possibly huge amounts of data when passing it to the process, which may be worse than before, no?

We should not be copying them at all? Couldn't we get open socket instances in the GenServer client and write to them directly?

Oh I confused for the subscriber side. For publishing we only list the directory, so yeah we can make the message lazy. It will be a part of Mix.Task.Compiler.notify_modules_compiled API, but that should be fine. See b824a5d.

josevalim · 2024-10-12T09:45:12Z

config :mix, :listeners, [SomeDep.MixListener]

I just realized this is not going to work as expected for umbrella projects. In case of umbrella projects, configuration is shared, but only some apps in the umbrella may depend on SomeDep. Given one of the reasons we chose the app environment was for IEx and we are ultimately not using it, it probably makes sense to roll back a :listeners key in def project.

jonatanklosko · 2024-10-14T08:06:07Z

it probably makes sense to roll back a :listeners key in def project.

The issue with project is that the config is not easily injectable, which is what the language server would need.

josevalim · 2024-10-14T08:12:50Z

The issue with project is that the config is not easily injectable, which is what the language server would need.

Mix has some APIs for pushing overrides to config but I don't know if the language servers can use it. We can also keep both, since the app config definitely won't work for Elixir projects.

jonatanklosko · 2024-10-14T09:50:28Z

Mix has some APIs for pushing overrides to config but I don't know if the language servers can use it.

Yeah, but whichever place the language server would do may be too late, after stuff is loaded.

josevalim

One last comment and we can ship it.

josevalim · 2024-10-14T14:25:38Z

lib/mix/lib/mix/sync/pubsub.ex

+        message =
+          case message do
+            lazy_message when is_function(lazy_message, 0) -> lazy_message.()
+            message -> message


Do we want to always pass a function, just in case?

Your call, we may use the pub/sub for other cases where there's nothing to compute, so forcing a function could be weird :)

josevalim · 2024-10-14T14:26:52Z

💚 💙 💜 💛 ❤️

ruslandoga · 2025-04-02T18:44:49Z

👋

This is probably silly, but I was working on a similar idea and noticed that beyond a certain size :gen_tcp.recv starts returning :enomem errors. Setting :buffer, :recbuf, :sndbuf to larger values doesn't seem to help.

Repro script

defmodule BigTcpTest do
  def run do
    port = 5555

    # Start a server in an async Task
    server_task =
      Task.async(fn ->
        # Listen on the chosen port
        {:ok, listen_sock} =
          :gen_tcp.listen(port, mode: :binary, packet: :raw, active: false, reuseaddr: true)

        IO.puts("Server listening on port #{port}")

        # Accept one connection
        {:ok, client_sock} = :gen_tcp.accept(listen_sock)
        IO.puts("Server accepted connection")

        # Attempt to read the entire ~100 MB in a single recv call
        case :gen_tcp.recv(client_sock, 100 * 1024 * 1024, :infinity) do
          {:ok, data} ->
            IO.puts("Server read data of length #{byte_size(data)}")
            # Echo it back
            :ok = :gen_tcp.send(client_sock, data)

          {:error, reason} ->
            IO.puts("Server failed to read: #{inspect(reason)}")
        end

        # Close sockets
        :gen_tcp.shutdown(client_sock, :read_write)
        :gen_tcp.close(client_sock)
        :gen_tcp.close(listen_sock)
      end)

    # Give the server time to start
    Process.sleep(500)

    # Connect to the server
    {:ok, sock} = :gen_tcp.connect(~c"127.0.0.1", port, [mode: :binary, packet: :raw, active: false], 5000)

    # Create 100 MB of data
    data = :crypto.strong_rand_bytes(100 * 1024 * 1024)
    IO.puts("Client generated 100MB of data")

    # Send it in one go
    :ok = :gen_tcp.send(sock, data)

    # Attempt to read the echo
    case :gen_tcp.recv(sock, byte_size(data), 15_000) do
      {:ok, ^data} ->
        IO.puts("Client received identical data back")

      {:ok, other} ->
        IO.puts("Client received different data back: #{byte_size(other)}")

      {:error, reason} ->
        IO.puts("Client failed to recv: #{inspect(reason)}")
    end

    :gen_tcp.close(sock)
    Task.await(server_task)
    IO.puts("Done.")
  end
end

BigTcpTest.run()

$ elixir big_tcp.exs
Server listening on port 5555
Server accepted connection
Client generated 100MB of data
Server failed to read: :enomem
Client failed to recv: :enomem
Done.

Reading in chunks seems to work.

Updated script

defmodule BigTcpTest do
  def run do
    port = 5555

    # Start a server in an async Task
    server_task =
      Task.async(fn ->
        # Listen on the chosen port
        {:ok, listen_sock} =
          :gen_tcp.listen(port, mode: :binary, packet: :raw, active: false, reuseaddr: true)

        IO.puts("Server listening on port #{port}")

        # Accept one connection
        {:ok, client_sock} = :gen_tcp.accept(listen_sock)
        IO.puts("Server accepted connection")

        # Attempt to read the entire ~100 MB in a single recv call
        case sock_recv(client_sock, 100 * 1024 * 1024, :infinity) do
          {:ok, data} ->
            IO.puts("Server read data of length #{byte_size(data)}")
            # Echo it back
            :ok = :gen_tcp.send(client_sock, data)

          {:error, reason} ->
            IO.puts("Server failed to read: #{inspect(reason)}")
        end

        # Close sockets
        :gen_tcp.shutdown(client_sock, :read_write)
        :gen_tcp.close(client_sock)
        :gen_tcp.close(listen_sock)
      end)

    # Give the server time to start
    Process.sleep(500)

    # Connect to the server
    {:ok, sock} = :gen_tcp.connect(~c"127.0.0.1", port, [mode: :binary, packet: :raw, active: false], 5000)

    # Create 100 MB of data
    data = :crypto.strong_rand_bytes(100 * 1024 * 1024)
    IO.puts("Client generated 100MB of data")

    # Send it in one go
    :ok = :gen_tcp.send(sock, data)

    # Attempt to read the echo
    case sock_recv(sock, byte_size(data), 15_000) do
      {:ok, ^data} ->
        IO.puts("Client received identical data back")

      {:ok, other} ->
        IO.puts("Client received different data back: #{byte_size(other)}")

      {:error, reason} ->
        IO.puts("Client failed to recv: #{inspect(reason)}")
    end

    :gen_tcp.close(sock)
    Task.await(server_task)
    IO.puts("Done.")
  end

  @one_mb 1024 * 1024

  # for larger messages, we need to read in chunks or we get {:error, :enomem}
  defp sock_recv(socket, size, timeout, acc \\ []) do
    with {:ok, data} <- :gen_tcp.recv(socket, min(size, @one_mb), timeout) do
      acc = [acc | data]

      case size - byte_size(data) do
        0 -> {:ok, IO.iodata_to_binary(acc)}
        left -> sock_recv(socket, left, timeout, acc)
      end
    end
  end
end

BigTcpTest.run()

$ elixir big_tcp_2.exs
Server listening on port 5555
Server accepted connection
Client generated 100MB of data
Server read data of length 104857600
Client received identical data back
Done.

josevalim · 2025-04-02T21:16:01Z

Thank you for the heads up. I think we are fine as we are not sending this much data here. :)

jonatanklosko commented Oct 11, 2024

View reviewed changes

lib/mix/test/mix/tasks/compile_test.exs Outdated Show resolved Hide resolved

jonatanklosko force-pushed the jk-compile-notify branch 3 times, most recently from 09509b9 to abebc9b Compare October 11, 2024 16:09

jonatanklosko added 2 commits October 12, 2024 01:21

Add API for listening to concurrent mix compilations

4dbd8b0

Remove --no-path-loading

437e53e

jonatanklosko force-pushed the jk-compile-notify branch from 8f81f18 to 437e53e Compare October 11, 2024 18:14