Skip to content

Incomplete log lines in cases; ensure ipython unregisters flush calls #845

@johnwhumphreys

Description

@johnwhumphreys

Update: (Splitting task into pieces per comments; this one should just focus on the high-pri short-term fix of "test_flush_logs_ipython is broken again; we need to make sure ipython unregister flush calls").


This is due to a very fast program shutdown. Or even for long-running programs, the very last few log lines will be missing. The remote processes do not have a chance to send logs back to the client. For now, many unit tests are with await asyncio.sleep(1) as a workaround.

Pull Request resolved: #816

In ipython notebook, a cell can end fast. Yet the process can still run in the background. However, the background process will not flush logs to the existing cell anymore. The patch registers the flush function upon a cell exiting.

    [monarch][logging] stress tests

diff --git a/fbcode/monarch/python/tests/python_actor_test_binary.py b/fbcode/monarch/python/tests/python_actor_test_binary.py
--- a/fbcode/monarch/python/tests/python_actor_test_binary.py
+++ b/fbcode/monarch/python/tests/python_actor_test_binary.py
@@ -8,7 +8,6 @@
 
 import asyncio
 import logging
-import sys
 
 import click
 
@@ -23,16 +22,16 @@
 class Printer(Actor):
     def __init__(self) -> None:
         self.logger: logging.Logger = logging.getLogger()
+        self.logger.setLevel(logging.INFO)
 
     @endpoint
     async def print(self, content: str) -> None:
         print(f"{content}", flush=True)
-        sys.stdout.flush()
-        sys.stderr.flush()
 
 
 async def _flush_logs() -> None:
     pm = await proc_mesh(gpus=2)
+
     # never flush
     await pm.logging_option(aggregate_window_sec=1000)
     am = await pm.spawn("printer", Printer)
@@ -41,9 +40,6 @@
     for _ in range(5):
         await am.print.call("has print streaming")
 
-    # Sleep a tiny so we allow the logs to stream back to the client
-    await asyncio.sleep(1)
-
 
 @main.command("flush-logs")
 def flush_logs() -> None:
diff --git a/fbcode/monarch/python/tests/test_python_actors.py b/fbcode/monarch/python/tests/test_python_actors.py
--- a/fbcode/monarch/python/tests/test_python_actors.py
+++ b/fbcode/monarch/python/tests/test_python_actors.py
@@ -725,22 +725,25 @@
 
     # Run the binary in a separate process and capture stdout and stderr
     cmd = [str(test_bin), "flush-logs"]
-    process = subprocess.run(cmd, capture_output=True, timeout=60, text=True)
 
-    # Check if the process ended without error
-    if process.returncode != 0:
-        raise RuntimeError(f"{cmd} ended with error code {process.returncode}. ")
+    # Stress test
+    for _ in range(20):
+        process = subprocess.run(cmd, capture_output=True, timeout=60, text=True)
 
-    # Assertions on the captured output
-    assert (
-        len(
-            re.findall(
-                r"similar.*has print streaming",
-                process.stdout,
+        # Check if the process ended without error
+        if process.returncode != 0:
+            raise RuntimeError(f"{cmd} ended with error code {process.returncode}. ")
+
+        # Assertions on the captured output
+        assert (
+            len(
+                re.findall(
+                    r"10 similar log lines.*has print streaming",
+                    process.stdout,
+                )
             )
-        )
-        == 1
-    ), process.stdout
+            == 1
+        ), process.stdout
 
 
 class SendAlot(Actor):

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions