Skip to content

Conversation

angelosilvestre
Copy link
Contributor

[Feature] Add scene comparison report. (Resolves #43)

This PR introduces the GoldenSceneReport, which holds information necessary to provide a human readable report.

Reports the following information:

  • Number of failed/passed items
  • Number of missing/extra candidates
  • The description of each item
  • The failure description of each failed item
  • The description of each missing candidate
  • The description of each extra candidate

for (final mismatch in mismatches.mismatches.values) {
FtgLog.pipeline.fine(" - ${mismatch.golden?.id ?? mismatch.screenshot?.id}: $mismatch");
switch (mismatch) {
case MissingGoldenMismatch(screenshot: null):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more convenient to have different classes for each of these situations? I don't really care either way. The whole API surface is up for debate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that different classes would make it clearer. Updated.

}
}
await tester.runAsync(() async {
final goldenWidth = mismatch.golden!.image.width;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stuff inside the tester.runAsync() should probably be a standalone behavior somewhere. Maybe inside of a failure_scene.dart, where we'll eventually implement entire scenes of failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

});
}

throw Exception("Goldens failed with ${mismatches.mismatches.length} mismatch(es)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify how tests usually output failure info. Do they run a regular print followed by an exception? Or does the exception typically contain all intended output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They just throw a TestFailure. I modified it to call fail, which is the same that is called when an expect fails.


/// A report of a golden scene test.
///
/// Holds information to display the results of a golden scene test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc line is redundant.

Some things worth pointing out would include the fact that this reports on individual success and failures for each golden in the scene, as well as goldens that have no candidate, and candidates that have no corresponding golden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

});

/// The human readable description of the scene.
final String sceneDescription;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that we should probably reference the entire golden scene metadata here, allowing the report to pull out whatever it wants from that.

For example, it's very likely that when we report a failure, we'll want to check the platform that generated the golden, and the platform that generated the candidate, and report that difference to the user, if it exists. That data will be available in the overall metadata. But if we go property by property then we've gotta keep piping things in here.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to reference the metadata. But it doesn't look like the GoldenSceneMetadata holds the scene description, so we would need to keep it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that change is only on my branches, but that metadata should definitely have the description. We should just add that, rather than avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the description to the metadata class. The goldens will need to be re-generated afterwards to store it.

final List<MissingGoldenMismatch> extraCandidates;

/// The total number of successful [items] in the scene.
final int totalPassed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense for these totals to be synthesized properties? Otherwise, it seems like there's a possible bug where the numbers passed in don't match the actual item, missingCandidates, and extraCandidates in the scene.

I'm also wondering if "passed" and "failed" is a sufficient variety of statuses when we're talking about totals. Is an extra candidate a failure? If so, you could end up with 20 failures in a scene with 5 goldens. At a minimum, we should think that through and ensure we're reporting info that's likely to be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed them to be synthesized properties.

I think a failure should be counted once per screenshot that was found in both the original golden and the candidate widget tree. I think we shouldn't treat a missing/extra candidate as a regular failure. Keeping them separate makes it clearer in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Please be sure to document that in the Dart Docs. I haven't looked at the updated code yet.

});

factory GoldenReportItem.success({
required String description,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a single required property, we probably don't need to name it.

Though I'm also wondering what value the description has. Is there a reason that users would want the description and nothing else? For example, we haven't passed the golden id in here, so how do we know which golden we're talking about?

Should the GoldenReport take the entire GoldenMetadata instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


/// The details of the golden check for this item.
///
/// Might contain both successful and failed checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this contain successes and failures? Isn't this class for a single golden?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be used for the cases you mentioned where we want to perform invisible checks. That way, a single golden can have a pixel check that succeeded and other checks that failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just ignore that for the moment, because we don't have sufficient API support to even generate those things.

That said, I'm thinking about things like focus and semantics as "layers" in a golden. I think it's OK for us to clearly differentiate between mismatched pixels vs mismatches in some arbitrary set of layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


final Map<String, GoldenImage> imagesById;

final GoldenSceneMetadata metadata;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here. I assume this was added to make the scene metadata available somewhere that we have a collection. Currently these are separate concepts. Perhaps later we'll merge them both into GoldenSceneMetadata, but for now we should respect the barrier.

Is there a reasonable way to get the scene metadata where you need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// The golden check passed.
items.add(
GoldenReport.success(
metadata.images.where((image) => image.id == screenshotId).first,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this searching for here? It's not obvious to me...

Also, it looks like this image is passed into both possible execution paths. Any reason not to look this up before the if-statement so that we don't risk screwing up the selection in one branch but not the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/// The total number of successful [items] in the scene.
int get totalPassed => items.where((e) => e.status == GoldenTestStatus.success).length;

/// The total number of failed [items] in the scene.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Dart Doc still needs clarity about what is considered a failure, per our previous conversation about missing goldens and missing candidates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

int get totalFailed => items.where((e) => e.status == GoldenTestStatus.failure).length;
}

/// An item in a golden scene report.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the term "item" is cyclical. It also no longer matches the name of the class. I believe this class is "A report of success or failure for a single golden within a scene."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@matthew-carroll matthew-carroll marked this pull request as ready for review June 20, 2025 23:22
@matthew-carroll matthew-carroll merged commit c919d19 into main Jun 20, 2025
1 check passed
@matthew-carroll matthew-carroll deleted the 43_golden-scene-report branch June 20, 2025 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] - Thoroughly report scene comparison when a test fails
2 participants