Skip to content

Conversation

@mwitiderrick
Copy link
Contributor

@mwitiderrick mwitiderrick commented Jul 14, 2025

@netlify
Copy link

netlify bot commented Jul 14, 2025

Deploy Preview for condescending-goldwasser-91acf0 ready!

Name Link
🔨 Latest commit a71fb1c
🔍 Latest deploy log https://app.netlify.com/projects/condescending-goldwasser-91acf0/deploys/687773898c348d0008633e0a
😎 Deploy Preview https://deploy-preview-1782--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

We can also calculate embeddings for titles instead of images, or even for both of them to find more errors.

{{< figure src=https://storage.googleapis.com/demo-dataset-quality-public/article/category_vs_name_and_image_transparent.png caption="Category vs. Title and Image" >}}
{{< figure src=/articles_data/dataset-quality/category_vs_name_and_image_transparent.png caption="Category vs. Title and Image" >}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image is missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both groups of items (Closest and Furthest) should be positioned in the same way regarding the central dashed line. Currently, "Closest" items are somewhere in the middle of the left-hand side of the image, and "Furthest" items are close to this line. I also think the version with real photos was just easier to understand, and we need to discuss how to present that again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some differences compared to the original image:

  1. Positioning of the "References" and "Outliers" boxes in the "Single Beds Dataset" is not consistent. The left and right margins, respectively, should be identical.
  2. Arrows connecting the dataset with encoders and then with "Embedding Space" suggest some connection, but those are two separate pipelines and should be visually separated here, too.
  3. "Embedding Space" -> "Embeddings Space"

@mwitiderrick mwitiderrick changed the title update images Finding errors in datasets with Similarity Search - update images Jul 16, 2025
@mwitiderrick
Copy link
Contributor Author

@kacperlukawski updated

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated images lost their explain-ability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants