Skip to content

Conversation

@k-jamroz
Copy link
Contributor

@k-jamroz k-jamroz commented Feb 28, 2025

First from a series of AI related examples. This example demonstrates the idea of similarity search implemented in a straight-forward, simple but not the most efficient way. This is example code for an upcoming blog post.

Fixes https://hazelcast.atlassian.net/browse/AI-298

Note to reviewers: the PR contains quite large files (~80MB) - movie data.

Checklist:

  • Request reviewers if possible
  • README.md or README.adoc is created with a description what example does and how to run it
  • Add tests that reproduce the associated tutorial or similar scenario

Copy link
Contributor

@TomaszGaweda TomaszGaweda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits, overall good job :)

// transform user input to vector
float[] query = embeddingModel.embed(userInput).content().vector();

// find & output top 10 similar matches of plot summary to given text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could do 2 in 1 (via some flag etc), so code wouldn't be commented?

Copy link
Contributor Author

@k-jamroz k-jamroz Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorPredicates.nearestNeighbours is not yet available even in 6.0-SNAPSHOT, so the example would not compile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then for now it's better to remove the commented code, as someone may try to use it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that will be probably less confusing: a82b3b5

@k-jamroz k-jamroz requested a review from TomaszGaweda March 14, 2025 18:06
Copy link
Contributor

@TomaszGaweda TomaszGaweda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, but I don't think re-review will be need, so approving in advance :) Good job!

// transform user input to vector
float[] query = embeddingModel.embed(userInput).content().vector();

// find & output top 10 similar matches of plot summary to given text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then for now it's better to remove the commented code, as someone may try to use it

+ "took " + Timer.secondsElapsed(start) + " seconds");
}

public static class MovieMetadata implements Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector field is mutable. this could be avoided, in this case it is more a matter of taste. both solutions would be equally simple

TextSimilaritySearchImap.offloaded = false;
TextSimilaritySearchImap.main(new String[]{"200% cars " + System.lineSeparator()});
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, line endings:

Suggested change
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
public class TextSimilaritySearchImap {

static boolean offloaded = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write a quick comment what it means "offloaded" in this particular case? Will be easier to understand for people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k-jamroz k-jamroz enabled auto-merge (squash) March 20, 2025 17:10
@k-jamroz k-jamroz merged commit 80ca73c into hazelcast:master Mar 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants