This project presents a scalable, weakly-supervised pipeline for logo recognition and visual similarity search. The system is designed to handle large-scale, unlabeled logo datasets by leveraging synthetic data generation, multimodal embeddings, and deep metric learning using a Triplet Network.
The core idea is to learn an embedding space where similar logos are projected close to each other, enabling high-performance retrieval even for previously unseen logos.
The pipeline consists of the following major stages:
-
Synthetic Dataset Creation:
Over 1.7 million logo images are generated along with text prompts describing their style and concept. -
Multimodal Embedding Construction:
Visual features are extracted via a frozen ResNet50; text prompts are embedded using MiniLM. Both are concatenated to form a 2432D vector. -
Clustering:
UMAP reduces the embedding to 600D, followed by HDBSCAN clustering to derive over 209,000 pseudo-categories as weak labels. -
Triplet Network Training:
Triplets are sampled using pseudo-labels. The model is trained to learn a 256D normalized embedding via Triplet Loss. -
Inference and Retrieval:
Embeddings from the trained model are queried against a FAISS index to retrieve similar logos.
---
config:
theme: redux
---
flowchart TB
%%== Simplified Inference Architecture: TripletNet Retrieval ==%%
A@{shape: lean-r, label: "User Input:\nQuery Logo Image"}
B@{shape: subproc, label: "Trained TripletNet\n(CNN + Projection + Norm)"}
C@{shape: rect, label: "Query Embedding (Vector)"}
D@{shape: cyl, label: "Embeddings Database\n(Precomputed Vectors)"}
E@{shape: subproc, label: "FAISS Index"}
C --> F@{shape: diamond, label: "Find Top-K Nearest Neighbors"}
F --> G@{shape: rect, label: "Retrieve Matching Logos"}
G --> H@{shape: curv-trap, label: "Display Results to User\n(Grid or Ranked View)"}
H --> I@{shape: dbl-circ, label: "End of Inference"}
A --> B --> C
D --> E
C --> E
E --> F
classDef inputStyle fill:#e1f5fe,stroke:#0288d1,stroke-width:1.5px;
classDef modelStyle fill:#ede7f6,stroke:#512da8,stroke-width:1.5px;
classDef embedStyle fill:#fff3e0,stroke:#f57c00,stroke-width:1.5px;
classDef dbStyle fill:#f9f,stroke:#333,stroke-width:2px;
classDef searchStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1.5px;
classDef displayStyle fill:#fff8e1,stroke:#f9a825,stroke-width:2px;
class A inputStyle;
class B modelStyle;
class C embedStyle;
class D,E dbStyle;
class F,G searchStyle;
class H,I displayStyle;
Loss curves during training reveal convergence behavior for each backbone:
This project demonstrates the potential of combining synthetic data, unsupervised clustering, and triplet-based metric learning to build a practical, label-free logo similarity system that can scale to millions of images.
For full documentation and academic report, see the docs/ directory.




