Skip to content

Commit a7baf8f

Browse files
Merge pull request #7 from waggle-sensor/cloudbench
Cloudbench
2 parents 19d4e5a + 7f8d9e4 commit a7baf8f

File tree

19 files changed

+15691
-10
lines changed

19 files changed

+15691
-10
lines changed

Readme.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,10 @@ kubectl kustomize nrp-dev -o sage-image-search-dev.yaml or kubectl kustomize nrp
159159
- https://huggingface.co/datasets/sagecontinuum/INQUIRE-Benchmark-small
160160
- https://huggingface.co/datasets/sagecontinuum/FireBench
161161
- ...
162+
- [ ] look into using text encoders only to see if just using caption-query comparisons can be enough or improve retrieval with embeddings. Essentially the image will NOT be embedded in the same vector space as the captions anymore.
163+
- embeddinggemma model: https://huggingface.co/google/embeddinggemma-300m
164+
- E5-mistral-7b-instruct: https://huggingface.co/intfloat/e5-mistral-7b-instruct
165+
- this is hosted by NRP so it will be easy to use.
162166
- [ ] Bechmark Milvus@NRP
163167
- using...
164168
- https://huggingface.co/datasets/sagecontinuum/INQUIRE-Benchmark-small
@@ -167,6 +171,10 @@ kubectl kustomize nrp-dev -o sage-image-search-dev.yaml or kubectl kustomize nrp
167171
- [ ] switch to reranking with Clip DFN5B-CLIP-ViT-H-14-378
168172
- before making the switch permanent run the benchmarking suite to see if there are any regressions
169173
- firebench results show that it is better than the current reranker model (ms-marco-MiniLM-L6-v2)
174+
- [ ] look into MMR (maximal marginal relevance) to see if it can improve the reranking performance or to implement it as a "toggle" to apply it only to certain queries.
175+
- https://milvus.io/ai-quick-reference/how-is-diversity-in-search-results-achieved
176+
- [ ] Integrate ShieldGemma 2 to implement policies and mark images as yes/no if the image violates the policy
177+
- [ShieldGemma 2 Model Card](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)
170178
- [ ] add a heartbeat metric for Sage Object Storage (nrdstor)
171179
- specifically here in the code: https://github.com/waggle-sensor/sage-nrp-image-search/blob/main/weavloader/processing.py#L159
172180
- [ ] add a metric to count the images that have been indexed into the vectordb
@@ -218,6 +226,4 @@ kubectl kustomize nrp-dev -o sage-image-search-dev.yaml or kubectl kustomize nrp
218226
- Incremental Update Latency
219227
- Time between new image upload and being searchable
220228
- examples here: https://chatgpt.com/c/684b1286-1144-8003-8a20-85a1045375c3
221-
- [ ] Integrate ShieldGemma 2 to implement policies and mark images as yes/no if the image violates the policy
222-
- [ShieldGemma 2 Model Card](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)
223229
- [ ] turn on batching for triton and utilize it in weavloader
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# CloudBench Benchmark Job Dockerfile
2+
# Combined Dockerfile for running both data loading and evaluation
3+
4+
ARG PYTHON_VERSION=3.11-slim
5+
FROM python:${PYTHON_VERSION}
6+
7+
WORKDIR /app
8+
9+
# Install system dependencies
10+
RUN apt-get update && apt-get install -y \
11+
build-essential \
12+
git \
13+
procps \
14+
&& rm -rf /var/lib/apt/lists/*
15+
16+
# Copy requirements
17+
COPY requirements.txt .
18+
RUN pip install --no-cache-dir -r requirements.txt
19+
20+
# Copy application code
21+
COPY . .
22+
23+
# Run combined benchmark script
24+
CMD ["python", "run_benchmark.py"]
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# CloudBench Benchmark Makefile
2+
# This file sets CloudBench-specific variables and includes the base benchmarking Makefile
3+
4+
# ============================================================================
5+
# Required Variables (must be set for base Makefile)
6+
# ============================================================================
7+
BENCHMARK_NAME := cloudbench
8+
DOCKERFILE_JOB := Dockerfile.job
9+
RESULTS_FILES := image_search_results.csv query_eval_metrics.csv
10+
ENV ?= dev
11+
ifeq ($(ENV),prod)
12+
KUSTOMIZE_DIR := ../../kubernetes/Cloudbench/nrp-prod
13+
else
14+
KUSTOMIZE_DIR := ../../kubernetes/Cloudbench/nrp-dev
15+
endif
16+
17+
# ============================================================================
18+
# Optional Variables (can be overridden)
19+
# ============================================================================
20+
KUBECTL_NAMESPACE := sage
21+
KUBECTL_CONTEXT ?= nautilus
22+
REGISTRY := gitlab-registry.nrp-nautilus.io/ndp/sage/nrp-image-search
23+
JOB_TAG ?= latest
24+
25+
# Local run script
26+
RUN_SCRIPT := run_benchmark.py
27+
28+
# Include the base Makefile (after setting variables)
29+
include ../Makefile
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# CloudBench Benchmark
2+
3+
This benchmark uses [CloudBench](https://huggingface.co/datasets/sagecontinuum/CloudBench) with Weaviate as the vector database for evaluating text-to-image retrieval in cloud and atmospheric science. CloudBench is a benchmark dataset for cloud image retrieval: natural language queries paired with images and binary relevance labels.
4+
5+
## Dataset
6+
7+
- **Source**: [sagecontinuum/CloudBench](https://huggingface.co/datasets/sagecontinuum/CloudBench) on Hugging Face
8+
- **Contents**: Query–image pairs with relevance labels (0 = not relevant, 1 = relevant), plus metadata (cloud_coverage, viewpoint, lighting, confounder_type, occlusion_present, multiple_cloud_types, horizon_visible, ground_visible, sun_visible, precipitation_visible, overcast, multiple_layers, storm_visible)
9+
- **Split**: The dataset provides a single `train` split (~4.6k rows)
10+
11+
## Usage
12+
13+
This benchmark is intended to be used with [Sage Image Search](../../../kubernetes/base/). The Makefile references components deployed there and runs the CloudBench benchmark job.
14+
15+
## Running the Benchmark
16+
17+
### Prerequisites
18+
19+
- **Kubernetes cluster** access with `kubectl` configured
20+
- **kustomize** (or kubectl with kustomize support)
21+
- **Docker** for building images
22+
- **Weaviate and Triton** deployed (e.g. from `kubernetes/nrp-dev` or `kubernetes/nrp-prod`)
23+
24+
### Steps
25+
26+
1. **Deploy Sage Image Search infrastructure** (from the main `kubernetes` directory):
27+
```bash
28+
kubectl apply -k nrp-dev # or nrp-prod
29+
```
30+
31+
2. **Build and push the benchmark image**:
32+
```bash
33+
cd benchmarking/benchmarks/Cloudbench
34+
make build
35+
docker push <registry>/benchmark-cloudbench-job:latest
36+
```
37+
38+
3. **Run the CloudBench benchmark** (loads data and evaluates):
39+
```bash
40+
make run # defaults to dev environment
41+
make logs # monitor progress
42+
```
43+
This loads `sagecontinuum/CloudBench` into Weaviate, runs the evaluation, and saves results.
44+
45+
4. **Run locally (development)**:
46+
```bash
47+
make run-local
48+
```
49+
Uses port-forwarding to Weaviate and Triton.
50+
51+
### Results
52+
53+
After a run, three files are produced:
54+
55+
- **`image_search_results.csv`**: Metadata of images returned for each query
56+
- **`query_eval_metrics.csv`**: Evaluation metrics (NDCG, precision, recall, etc.) per query
57+
- **`config_values.csv`**: Configuration used for the run (`config.to_csv()`)
58+
59+
Results are written to `/app/results` in Kubernetes (with a volume mount) or to the current directory when using `make run-local`. Optional S3 upload uses paths like `{S3_PREFIX}/{timestamp}/{filename}`.
60+
61+
## Environment Variables
62+
63+
- **CLOUDBENCH_DATASET**: HuggingFace dataset name (default: `sagecontinuum/CloudBench`)
64+
- **COLLECTION_NAME**: Weaviate collection name (default: `CloudBench`)
65+
- **SAMPLE_SIZE**: Number of samples (0 = use full dataset)
66+
- **SEED**, **HF_TOKEN**, **WORKERS**, **IMAGE_BATCH_SIZE**, **QUERY_BATCH_SIZE**: Data and processing
67+
- **QUERY_METHOD**, **TARGET_VECTOR**, **RESPONSE_LIMIT**: Query and retrieval
68+
- See `config.py` for the full list (Weaviate, Triton, S3, etc.).
69+
70+
## Citation
71+
72+
If you use CloudBench, cite the dataset:
73+
74+
```bibtex
75+
@misc{cloudbench_2026,
76+
author = { Sage Continuum and Francisco Lozano },
77+
affiliation = { Northwestern University },
78+
title = { CloudBench },
79+
year = 2026,
80+
url = { https://huggingface.co/datasets/sagecontinuum/CloudBench },
81+
doi = { 10.57967/hf/7784 },
82+
publisher = { Hugging Face }
83+
}
84+
```
85+
86+
## References
87+
88+
- [CloudBench on Hugging Face](https://huggingface.co/datasets/sagecontinuum/CloudBench)
89+
- [Weaviate: NDCG and retrieval evaluation](https://weaviate.io/blog/retrieval-evaluation-metrics#normalized-discounted-cumulative-gain-ndcg)
90+
- [imsearch_eval](https://github.com/waggle-sensor/imsearch_eval) framework
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""CloudBench benchmark dataset implementation."""
2+
from imsearch_eval.adapters.huggingface import HuggingFaceDataset
3+
4+
5+
class CloudBench(HuggingFaceDataset):
6+
"""Benchmark dataset class for CloudBench (cloud/atmospheric image retrieval)."""
7+
8+
def get_query_column(self) -> str:
9+
"""Get the name of the column containing the query text."""
10+
return "query_text"
11+
12+
def get_query_id_column(self) -> str:
13+
"""Get the name of the column containing the query ID."""
14+
return "query_id"
15+
16+
def get_relevance_column(self) -> str:
17+
"""Get the name of the column containing relevance labels (0=not relevant, 1=relevant)."""
18+
return "relevance_label"
19+
20+
def get_metadata_columns(self) -> list:
21+
"""Get optional metadata columns to include in evaluation stats."""
22+
return [
23+
"cloud_coverage",
24+
"viewpoint",
25+
"lighting",
26+
"confounder_type",
27+
"occlusion_present",
28+
"multiple_cloud_types",
29+
"horizon_visible",
30+
"ground_visible",
31+
"sun_visible",
32+
"precipitation_visible",
33+
"overcast",
34+
"multiple_layers",
35+
"storm_visible",
36+
]
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
"""CloudBench-specific configuration/hyperparameters."""
2+
3+
import os
4+
from weaviate.classes.config import VectorDistances, Configure
5+
from weaviate.collections.classes.config_vector_index import VectorFilterStrategy
6+
7+
from imsearch_eval.framework.interfaces import Config
8+
9+
10+
class CloudBenchConfig(Config):
11+
"""Configuration for CloudBench benchmark (cloud/atmospheric image retrieval)."""
12+
13+
def __init__(self):
14+
"""Initialize CloudBench configuration."""
15+
# dataset parameters
16+
self.cloudbench_dataset = os.environ.get(
17+
"CLOUDBENCH_DATASET", "sagecontinuum/CloudBench"
18+
)
19+
self.sample_size = int(os.environ.get("SAMPLE_SIZE", 0))
20+
self.seed = int(os.environ.get("SEED", 42))
21+
self._hf_token = os.environ.get("HF_TOKEN", "")
22+
# Upload parameters
23+
self._upload_to_s3 = os.environ.get("UPLOAD_TO_S3", "false").lower() == "true"
24+
self._s3_bucket = os.environ.get("S3_BUCKET", "sage_imsearch")
25+
self._s3_prefix = os.environ.get("S3_PREFIX", "dev-metrics/cloudbench")
26+
self._s3_endpoint = os.environ.get(
27+
"S3_ENDPOINT", "http://rook-ceph-rgw-nautiluss3.rook"
28+
)
29+
self._s3_access_key = os.environ.get("S3_ACCESS_KEY", "")
30+
self._s3_secret_key = os.environ.get("S3_SECRET_KEY", "")
31+
self._s3_secure = os.environ.get("S3_SECURE", "false").lower() == "true"
32+
self._image_results_file = os.environ.get(
33+
"IMAGE_RESULTS_FILE", "image_search_results.csv"
34+
)
35+
self._query_eval_metrics_file = os.environ.get(
36+
"QUERY_EVAL_METRICS_FILE", "query_eval_metrics.csv"
37+
)
38+
self._config_values_file = os.environ.get(
39+
"CONFIG_VALUES_FILE", "config_values.csv"
40+
)
41+
42+
# Weaviate parameters
43+
self._weaviate_host = os.environ.get("WEAVIATE_HOST", "127.0.0.1")
44+
self._weaviate_port = os.environ.get("WEAVIATE_PORT", "8080")
45+
self._weaviate_grpc_port = os.environ.get("WEAVIATE_GRPC_PORT", "50051")
46+
self._collection_name = os.environ.get("COLLECTION_NAME", "CloudBench")
47+
48+
# model provider parameters
49+
self._llm_model_provider = os.environ.get(
50+
"LLM_MODEL_PROVIDER", "triton"
51+
).lower()
52+
53+
# Triton parameters
54+
self._triton_host = os.environ.get("TRITON_HOST", "triton")
55+
self._triton_port = os.environ.get("TRITON_PORT", "8001")
56+
57+
# Workers parameters
58+
self._workers = int(os.environ.get("WORKERS", 5))
59+
self._image_batch_size = int(os.environ.get("IMAGE_BATCH_SIZE", 25))
60+
self._query_batch_size = int(os.environ.get("QUERY_BATCH_SIZE", 5))
61+
62+
# Logging parameters
63+
self._log_level = os.environ.get("LOG_LEVEL", "INFO").upper()
64+
65+
# Weaviate HNSW hyperparameters
66+
self.hnsw_dist_metric = getattr(
67+
VectorDistances, os.environ.get("HNSW_DIST_METRIC", "COSINE").upper()
68+
)
69+
self.hnsw_ef = int(os.environ.get("HNSW_EF", -1))
70+
self.hnsw_ef_construction = int(os.environ.get("HNSW_EF_CONSTRUCTION", 100))
71+
self.hnsw_maxConnections = int(os.environ.get("HNSW_MAX_CONNECTIONS", 50))
72+
self.hsnw_dynamicEfMax = int(os.environ.get("HNSW_DYNAMIC_EF_MAX", 500))
73+
self.hsnw_dynamicEfMin = int(os.environ.get("HNSW_DYNAMIC_EF_MIN", 200))
74+
self.hnsw_ef_factor = int(os.environ.get("HNSW_EF_FACTOR", 20))
75+
self.hsnw_filterStrategy = getattr(
76+
VectorFilterStrategy,
77+
os.environ.get("HNSW_FILTER_STRATEGY", "ACORN").upper(),
78+
)
79+
self.hnsw_flatSearchCutoff = int(
80+
os.environ.get("HNSW_FLAT_SEARCH_CUTOFF", 40000)
81+
)
82+
self.hnsw_vector_cache_max_objects = int(
83+
os.environ.get("HNSW_VECTOR_CACHE_MAX_OBJECTS", 1e12)
84+
)
85+
self.hnsw_quantizer = Configure.VectorIndex.Quantizer.pq(
86+
training_limit=int(
87+
os.environ.get("HNSW_QUANTIZER_TRAINING_LIMIT", 500000)
88+
)
89+
)
90+
91+
# Query parameters
92+
self.query_method = os.environ.get("QUERY_METHOD", "clip_hybrid_query")
93+
self.target_vector = os.environ.get("TARGET_VECTOR", "clip")
94+
self.response_limit = int(os.environ.get("RESPONSE_LIMIT", 50))
95+
self.advanced_query_parameters = {
96+
"alpha": float(os.environ.get("QUERY_ALPHA", 0.4)),
97+
"query_properties": ["caption"],
98+
"autocut_jumps": int(os.environ.get("AUTOCUT_JUMPS", 0)),
99+
"rerank_prop": os.environ.get("RERANK_PROP", "caption"),
100+
"clip_alpha": float(os.environ.get("CLIP_ALPHA", 0.7)),
101+
}
102+
103+
# Caption prompts (same as Firebench)
104+
default_prompt = """
105+
role:
106+
You are a world-class Scientific Image Captioning Expert.
107+
108+
context:
109+
You will be shown a scientific image captured by edge devices. Your goal is to analyze its content and significance in detail.
110+
111+
task:
112+
Generate exactly one scientifically detailed caption that accurately describes what is visible in the image and its scientific relevance.
113+
Make it as detailed as possible. Also extract text and numbers from the images.
114+
115+
constraints:
116+
- Only return:
117+
1. A single caption.
118+
2. a list of 15 keywords relevant to the image.
119+
- Do not include any additional text, explanations, or formatting.
120+
121+
format:
122+
caption: <your_scientific_caption_here>
123+
keywords: <keyword1>, <keyword2>, ...
124+
"""
125+
self.gemma3_prompt = os.environ.get("GEMMA3_PROMPT", default_prompt)
126+
127+
@staticmethod
128+
def is_nrp_key_set():
129+
"""Check if NRP API key is set."""
130+
if os.environ.get("NRP_API_KEY", "") == "":
131+
raise ValueError("NRP_API_KEY is not set")

0 commit comments

Comments
 (0)