-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
Update: (Splitting task into pieces per comments; this one should just focus on the high-pri short-term fix of "test_flush_logs_ipython is broken again; we need to make sure ipython unregister flush calls").
This is due to a very fast program shutdown. Or even for long-running programs, the very last few log lines will be missing. The remote processes do not have a chance to send logs back to the client. For now, many unit tests are with await asyncio.sleep(1)
as a workaround.
Pull Request resolved: #816
In ipython notebook, a cell can end fast. Yet the process can still run in the background. However, the background process will not flush logs to the existing cell anymore. The patch registers the flush function upon a cell exiting.
[monarch][logging] stress tests
diff --git a/fbcode/monarch/python/tests/python_actor_test_binary.py b/fbcode/monarch/python/tests/python_actor_test_binary.py
--- a/fbcode/monarch/python/tests/python_actor_test_binary.py
+++ b/fbcode/monarch/python/tests/python_actor_test_binary.py
@@ -8,7 +8,6 @@
import asyncio
import logging
-import sys
import click
@@ -23,16 +22,16 @@
class Printer(Actor):
def __init__(self) -> None:
self.logger: logging.Logger = logging.getLogger()
+ self.logger.setLevel(logging.INFO)
@endpoint
async def print(self, content: str) -> None:
print(f"{content}", flush=True)
- sys.stdout.flush()
- sys.stderr.flush()
async def _flush_logs() -> None:
pm = await proc_mesh(gpus=2)
+
# never flush
await pm.logging_option(aggregate_window_sec=1000)
am = await pm.spawn("printer", Printer)
@@ -41,9 +40,6 @@
for _ in range(5):
await am.print.call("has print streaming")
- # Sleep a tiny so we allow the logs to stream back to the client
- await asyncio.sleep(1)
-
@main.command("flush-logs")
def flush_logs() -> None:
diff --git a/fbcode/monarch/python/tests/test_python_actors.py b/fbcode/monarch/python/tests/test_python_actors.py
--- a/fbcode/monarch/python/tests/test_python_actors.py
+++ b/fbcode/monarch/python/tests/test_python_actors.py
@@ -725,22 +725,25 @@
# Run the binary in a separate process and capture stdout and stderr
cmd = [str(test_bin), "flush-logs"]
- process = subprocess.run(cmd, capture_output=True, timeout=60, text=True)
- # Check if the process ended without error
- if process.returncode != 0:
- raise RuntimeError(f"{cmd} ended with error code {process.returncode}. ")
+ # Stress test
+ for _ in range(20):
+ process = subprocess.run(cmd, capture_output=True, timeout=60, text=True)
- # Assertions on the captured output
- assert (
- len(
- re.findall(
- r"similar.*has print streaming",
- process.stdout,
+ # Check if the process ended without error
+ if process.returncode != 0:
+ raise RuntimeError(f"{cmd} ended with error code {process.returncode}. ")
+
+ # Assertions on the captured output
+ assert (
+ len(
+ re.findall(
+ r"10 similar log lines.*has print streaming",
+ process.stdout,
+ )
)
- )
- == 1
- ), process.stdout
+ == 1
+ ), process.stdout
class SendAlot(Actor):