Skip to content

Commit 31634b8

Browse files
VieEeEwpytorchmergebot
authored andcommitted
[fr] Added protection against missing stack frames in fr cont. (pytorch#150133)
Summary: Previously we had D70358287, which didn't fully resolved the issue. Test Plan: # FR `buck2 run @//mode/opt //caffe2/fb/flight_recorder:fr_trace -- --mast_job_id f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0 --bucket tlcm_log_blob --world_size 128 --dump_file_name_offset 0 --allow-incomplete-ranks` Confirm no error # FR analyzer `buck2 run @//mode/opt //investigations/dr_patternson/analyzers/ai_observability:ai_observability-all-analyzers-cli -- flight_recorder_analyzer --mast_job_name f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0` Confirm no error Differential Revision: D71998980 Pull Request resolved: pytorch#150133 Approved by: https://github.com/fduwjj
1 parent 827b730 commit 31634b8

File tree

1 file changed

+2
-2
lines changed
  • tools/flight_recorder/components

1 file changed

+2
-2
lines changed

tools/flight_recorder/components/types.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ def __init__(self, entry: dict[str, Any], expected_ranks: set[int]) -> None:
224224
self.input_sizes = entry["input_sizes"]
225225
self.output_sizes = entry["output_sizes"]
226226
self.collective_state = entry["state"]
227-
self.collective_frames = entry["frames"]
227+
self.collective_frames = entry.get("frames", [])
228228
self.expected_ranks = expected_ranks
229229
self.missing_ranks: set[int]
230230
self.input_numel: int
@@ -316,7 +316,7 @@ def to_collective(
316316
output_sizes=entry["output_sizes"],
317317
expected_ranks=self.expected_ranks,
318318
collective_state=entry["state"],
319-
collective_frames=entry["frames"],
319+
collective_frames=entry.get("frames", []),
320320
type_of_mismatch=error,
321321
)
322322
return Collective(

0 commit comments

Comments
 (0)