You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: inference/core/workflows/core_steps/math/cosine_similarity/v1.py
+58-8Lines changed: 58 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -16,11 +16,53 @@
16
16
)
17
17
18
18
LONG_DESCRIPTION="""
19
-
Calculate the cosine similarity between two embeddings.
19
+
Calculate the cosine similarity between two embedding vectors by computing the cosine of the angle between them, measuring directional similarity regardless of magnitude to enable similarity comparison, semantic matching, embedding-based search, and similarity-based filtering workflows.
20
20
21
-
A cosine similarity of 1 means the two embeddings are identical,
22
-
while a cosine similarity of 0 means the two embeddings are orthogonal.
23
-
Greater values indicate greater similarity.
21
+
## How This Block Works
22
+
23
+
This block computes cosine similarity, a measure of similarity between two vectors based on the cosine of the angle between them. The block:
24
+
25
+
1. Receives two embedding vectors from workflow steps (e.g., from CLIP, Perception Encoder, or other embedding models)
26
+
2. Validates embedding dimensions:
27
+
- Ensures both embeddings have the same dimensionality (same number of elements)
28
+
- Raises an error if dimensions don't match
29
+
3. Computes cosine similarity:
30
+
- Calculates the dot product of the two embedding vectors
31
+
- Computes the L2 norm (magnitude) of each embedding vector
32
+
- Divides the dot product by the product of the two norms: similarity = (a · b) / (||a|| × ||b||)
33
+
- This measures the cosine of the angle between the vectors, indicating directional similarity
34
+
4. Returns similarity score:
35
+
- Outputs a similarity value ranging from -1 to 1
36
+
- Value of 1: Vectors point in the same direction (identical or proportional) - maximum similarity
37
+
- Value of 0: Vectors are orthogonal (perpendicular) - no similarity
38
+
- Value of -1: Vectors point in opposite directions - maximum dissimilarity
39
+
- Greater values (closer to 1) indicate greater similarity
40
+
41
+
Cosine similarity is magnitude-invariant, meaning it measures similarity in direction rather than size. Two vectors that point in the same direction will have high cosine similarity even if they have different magnitudes. This makes it ideal for comparing embeddings where magnitude may vary but semantic meaning (direction) is what matters.
42
+
43
+
## Common Use Cases
44
+
45
+
- **Semantic Similarity Comparison**: Compare semantic similarity between images, text, or other data types using embeddings (e.g., compare image embeddings, match text to images, find similar content), enabling similarity comparison workflows
46
+
- **Embedding-Based Search**: Use similarity scores for embedding-based search and retrieval (e.g., find similar images, search by embedding similarity, retrieve similar content), enabling embedding search workflows
47
+
- **Cross-Modal Matching**: Match embeddings across different modalities (e.g., match images to text, find images matching text descriptions, match text to images), enabling cross-modal matching workflows
48
+
- **Similarity-Based Filtering**: Filter data based on similarity thresholds (e.g., filter similar items, find duplicates using similarity, identify near-duplicates), enabling similarity filtering workflows
49
+
- **Content Recommendation**: Use similarity scores for content recommendation and matching (e.g., recommend similar content, match related items, suggest similar products), enabling recommendation workflows
50
+
- **Quality Control and Validation**: Validate embeddings or compare embeddings for quality control (e.g., validate embedding quality, compare embeddings for consistency, check embedding similarity), enabling quality control workflows
51
+
52
+
## Connecting to Other Blocks
53
+
54
+
This block receives embeddings from embedding model blocks and produces similarity scores:
55
+
56
+
- **After embedding model blocks** (CLIP, Perception Encoder, etc.) to compare embeddings (e.g., compare image and text embeddings, compare multiple embeddings, compute similarity scores), enabling embedding-to-similarity workflows
57
+
- **Before logic blocks** like Continue If to use similarity scores in conditions (e.g., continue if similarity exceeds threshold, filter based on similarity, make decisions using similarity), enabling similarity-based decision workflows
58
+
- **Before filtering blocks** to filter based on similarity (e.g., filter by similarity threshold, remove low-similarity items, keep high-similarity matches), enabling similarity-to-filter workflows
59
+
- **Before data storage blocks** to store similarity scores (e.g., store similarity metrics, log similarity comparisons, save similarity results), enabling similarity storage workflows
60
+
- **Before notification blocks** to send similarity-based alerts (e.g., notify on high similarity matches, alert on similarity changes, send similarity reports), enabling similarity notification workflows
61
+
- **In workflow outputs** to provide similarity scores as final output (e.g., similarity comparison outputs, matching results, similarity metrics), enabling similarity output workflows
62
+
63
+
## Requirements
64
+
65
+
This block requires two embedding vectors with the same dimensionality (same number of elements). Embeddings can be from any embedding model (CLIP, Perception Encoder, etc.) and can represent images, text, or other data types. The embeddings are passed as lists of floats. The block computes cosine similarity using the dot product divided by the product of L2 norms, producing a similarity score between -1 and 1. Values closer to 1 indicate greater similarity, values closer to 0 indicate orthogonal vectors, and values closer to -1 indicate opposite directions.
24
66
"""
25
67
26
68
@@ -43,12 +85,20 @@ class BlockManifest(WorkflowBlockManifest):
description="First embedding vector to compare. Must have the same dimensionality (same number of elements) as embedding_2. Can be from any embedding model (CLIP, Perception Encoder, etc.) and can represent images, text, or other data types. Embedding vectors are lists of floats representing high-dimensional feature representations.",
description="Second embedding vector to compare. Must have the same dimensionality (same number of elements) as embedding_1. Can be from any embedding model (CLIP, Perception Encoder, etc.) and can represent images, text, or other data types. Embedding vectors are lists of floats representing high-dimensional feature representations. The cosine similarity measures the similarity between embedding_1 and embedding_2.",
0 commit comments