emb-explorer

emb-explorer is a Streamlit-based visual exploration and clustering tool for image datasets and pre-calculated image embeddings.

🎯 Demo Screenshots

📊 Embed & Explore Images	🔍 Explore Pre-calculated Embeddings
Embedding Interface Embed your images using pre-trained models	Smart Filtering Apply filters to pre-calculated embeddings
Cluster Summary Analyze clustering results and representative images	Interactive Exploration Explore clusters with interactive visualization
	Taxonomy Tree Navigation Browse hierarchical taxonomy structure

Features

Embed & Explore Images from Upload

Batch Image Embedding: Efficiently embed large collections of images using the pretrained model (e.g., CLIP, BioCLIP) on CPU or GPU (preferably), with customizable batch size and parallelism.
Clustering: Reduces embedding vectors to 2D using PCA, T-SNE, and UMAP. Performs K-Means clustering and display result using a scatter plot. Explore clusters via interactive scatter plots. Click on data points to preview images and details.
Cluster-Based Repartitioning: Copy/repartition images into cluster-specific folders with a single click. Generates a summary CSV for downstream use.
Clustering Summary: Displays cluster sizes, variances, and representative images for each cluster, helping you evaluate clustering quality.

Explore Pre-computed Embeddings

Parquet File Support: Load precomputed embeddings with associated metadata from parquet files. Compatible with various embedding formats and metadata schemas.
Advanced Filtering: Filter datasets by taxonomic hierarchy, source datasets, and custom metadata fields. Combine multiple filter criteria for precise data selection.
Clustering: Reduce embedding vectors to 2D using PCA, UMAP, or t-SNE. Perform K-Means clustering and display result using a scatter plot. Explore clusters via interactive scatter plots. Click on points to preview images and explore metadata details.
Taxonomy Tree Navigation: Browse hierarchical biological classifications with interactive tree view. Expand and collapse taxonomic nodes to explore at different classification levels.

Installation

uv is a fast Python package installer and resolver. Install uv first if you haven't already:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

Then install the project:

# Clone the repository
git clone https://github.com/Imageomics/emb-explorer.git
cd emb-explorer

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

GPU Support (Optional)

For GPU acceleration, you'll need CUDA 12.0+ installed on your system.

# Full GPU support with RAPIDS (cuDF + cuML)
uv pip install -e ".[gpu]"

# Minimal GPU support (PyTorch + FAISS only)
uv pip install -e ".[gpu-minimal]"

Development

# Install with development tools
uv pip install -e ".[dev]"

Usage

Running the Application

# Activate virtual environment (if not already activated)
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Run the Streamlit app
streamlit run app.py

An example dataset (example_1k.parquet) is provided in the data/ folder for testing the pre-calculated embeddings features. This parquet contains metadata and the BioCLIP 2 embeddings for a one thousand-image subset of TreeOfLife-200M.

Command Line Tools

The project also provides command-line utilities:

# List all available models
python list_models.py --format table

# List models in JSON format
python list_models.py --format json --pretty

# List models as names only
python list_models.py --format names

# Get help for the list models command
python list_models.py --help

Running on Remote Compute Nodes

If running the app on a remote compute node (e.g., HPC cluster), you'll need to set up port forwarding to access the Streamlit interface from your local machine.

Start the app on the compute node:
```
# On the remote compute node
streamlit run app.py
```
Note the port number (default is 8501) and the compute node hostname.
Set up SSH port forwarding from your local machine:
```
# From your local machine
ssh -N -L 8501:<COMPUTE_NODE>:8501 <USERNAME>@<LOGIN_NODE>
```
Example:
```
ssh -N -L 8501:c0828.ten.osc.edu:8501 [email protected]
```
Replace:
- <COMPUTE_NODE> with the actual compute node hostname (e.g., c0828.ten.osc.edu)
- <USERNAME> with your username
- <LOGIN_NODE> with the login node address (e.g., cardinal.osc.edu)
Access the app: Open your web browser and navigate to http://localhost:8501

The -N flag prevents SSH from executing remote commands, and -L sets up the local port forwarding.

Notes on Implementation

More notes on different implementation methods and approaches are available in the implementation summary doc.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
components		components
data		data
docs		docs
lib		lib
pages		pages
scripts		scripts
services		services
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
list_models.py		list_models.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

emb-explorer

🎯 Demo Screenshots

📊 Embed & Explore Images

🔍 Explore Pre-calculated Embeddings

Embedding Interface

Smart Filtering

Cluster Summary

Interactive Exploration

Taxonomy Tree Navigation

Features

Embed & Explore Images from Upload

Explore Pre-computed Embeddings

Installation

GPU Support (Optional)

Development

Usage

Running the Application

Command Line Tools

Running on Remote Compute Nodes

Notes on Implementation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Imageomics/emb-explorer

Folders and files

Latest commit

History

Repository files navigation

emb-explorer

🎯 Demo Screenshots

📊 Embed & Explore Images

🔍 Explore Pre-calculated Embeddings

Embedding Interface

Smart Filtering

Cluster Summary

Interactive Exploration

Taxonomy Tree Navigation

Features

Embed & Explore Images from Upload

Explore Pre-computed Embeddings

Installation

GPU Support (Optional)

Development

Usage

Running the Application

Command Line Tools

Running on Remote Compute Nodes

Notes on Implementation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages