Conversation
|
Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text: I have read the DCO document and I hereby sign the DCO. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot. |
Greptile SummaryThis PR introduces Elorian, a new multimodal VLM evaluation pipeline built on top of DataDesigner. It adds a full Key issues found:
|
| Filename | Overview |
|---|---|
| elorian/judges.py | LLM judge registry; contains two P1 issues: docstring example uses hyphenated alias and provider-prefixed model_id, and build_prompt uses str.format() which raises KeyError on custom templates with curly braces |
| elorian/pipeline.py | Core pipeline orchestrator; global monkey-patch on ImageContext applied at import time and unconditional os.environ mutation in _get_data_designer() are process-wide side effects; also minor describe() truncation bug |
| elorian/run_example.py | Working example script; commented-out judge block contains a hyphenated alias and a doubly-prefixed model_id that would fail at runtime if uncommented; also has an unused import |
| elorian/models.py | VLM model registry and provider management; clean implementation with clear separation between ModelSpec, ModelRegistry, and PROVIDERS dict |
| elorian/image_utils.py | Image loading utilities; module docstring claims URL loading support but no such function is implemented; otherwise straightforward PIL/HuggingFace loaders |
| elorian/init.py | Package init with clean __all__ exports; no issues |
| elorian/README.md | Comprehensive documentation for the Elorian pipeline; well-structured with good examples; consistently enforces the underscore alias rule |
Sequence Diagram
sequenceDiagram
participant User
participant Pipeline as MultimodalEvalPipeline
participant ImgUtils as image_utils
participant ModelReg as ModelRegistry
participant JudgeReg as JudgeRegistry
participant DD as DataDesigner
User->>ImgUtils: load_images_from_hf_dataset()
ImgUtils-->>User: seed_df (uuid, base64_image, label)
User->>Pipeline: __init__(seed_df, model_registry, judge_registry)
User->>Pipeline: preview(num_records)
Pipeline->>ModelReg: to_model_configs()
ModelReg-->>Pipeline: [ModelConfig(claude_vision), ModelConfig(gpt4o_vision)]
Pipeline->>JudgeReg: specs
JudgeReg-->>Pipeline: [JudgeSpec(judge_claude)]
Pipeline->>Pipeline: _build_config()
Note over Pipeline: Adds seed dataset<br/>Adds response_{alias} columns (LLMTextColumnConfig)<br/>Adds eval_{alias} columns (LLMJudgeColumnConfig)
Pipeline->>Pipeline: _get_data_designer()
Note over Pipeline: Sets DATA_DESIGNER_MODEL_BACKEND=litellm_bridge
Pipeline->>DD: validate(builder)
Pipeline->>DD: preview(builder, num_records)
DD-->>User: PreviewResults (response_claude_vision, response_gpt4o_vision, eval_judge_claude)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 9-13
Comment:
**Docstring example contradicts usage rules on two counts**
The docstring example at the top of the file violates the rules established elsewhere:
1. `alias="my-judge"` uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases **must use underscores**, because hyphens break Jinja2 template rendering (e.g., `{{ response_my-judge }}` is parsed as `{{ response_my }} - judge`). This example will silently produce broken column references at runtime.
2. `model_id="anthropic/claude-sonnet-4-20250514"` includes the provider prefix. However, `JudgeSpec.to_model_config()` already passes `provider=self.provider` to `dd.ModelConfig`, so including the prefix in `model_id` would result in a doubly-prefixed LiteLLM call (`anthropic/anthropic/claude-sonnet-4-20250514`), causing an API routing failure. The actual registered default (`judge_claude`) correctly uses `model_id="claude-sonnet-4-20250514"` without a prefix.
```suggestion
registry = JudgeRegistry()
registry.register(JudgeSpec(
alias="my_judge",
model_id="claude-sonnet-4-20250514",
scores=[...],
prompt_template="...",
))
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/judges.py
Line: 117-132
Comment:
**`str.format()` raises `KeyError` for custom templates containing curly braces**
`self.prompt_template.format(responses_block=responses_block)` will raise a `KeyError` for any custom `prompt_template` that contains `{...}` placeholders other than `{responses_block}` — for example a template that includes a JSON snippet like `{"key": "value"}` or a named slot like `{task_description}`. This is a common pattern in judge prompts.
The fix is to use a single `.replace()` call, which is immune to stray braces:
```python
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 51-52
Comment:
**Global monkey-patch applied at module import time**
`ImageContext._format_base64_context` is patched unconditionally the moment `elorian.pipeline` is imported. Any other code in the same process that imports `data_designer.config.models.ImageContext` — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside `_get_data_designer()`) or, better, contributing the fix upstream and removing the patch entirely once it lands.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 143-151
Comment:
**`os.environ` mutation is a permanent, process-wide side effect**
Setting `os.environ["DATA_DESIGNER_MODEL_BACKEND"]` inside `_get_data_designer()` permanently overrides the environment for the entire process — including any other code that creates a `DataDesigner` instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the `DataDesigner` constructor instead, or at least restoring the original value:
```python
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prev
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 192-194
Comment:
**`...` is always appended even when the prompt is shorter than 80 characters**
`self.prompt[:80] + "..."` unconditionally appends the ellipsis. For a short custom prompt (e.g., `"Describe this image."`), the output reads `Prompt: Describe this image....` which is misleading.
```suggestion
lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}")
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/image_utils.py
Line: 1-8
Comment:
**Module docstring lists URL loading as supported, but no such function exists**
The module docstring lists `- A list of image URLs` as a supported source, but there is no `load_images_from_urls()` function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
```suggestion
"""Utilities for loading and processing images into DataDesigner seed datasets.
Supports loading from:
- A local directory of image files
- A HuggingFace dataset with an image column
- A list of PIL images
"""
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 60-66
Comment:
**Commented example has a hyphenated alias and a doubly-prefixed `model_id`**
If a user uncomments this block as-is, two bugs will surface:
1. `alias="judge-gpt4o"` — the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference (`{{ eval_judge-gpt4o }}`).
2. `model_id="openai/gpt-4o"` — this includes the provider prefix. Because `JudgeSpec.to_model_config()` also sets `provider="openai"` (the default), LiteLLM will receive `openai/gpt-4o` with `provider=openai`, which may be interpreted as `openai/openai/gpt-4o` depending on routing, causing an API error.
```suggestion
# judge_registry.register(
# JudgeSpec(
# alias="judge_gpt4o",
# model_id="gpt-4o",
# provider="openai",
# description="GPT-4o as LLM judge",
# )
# )
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 13
Comment:
**Unused import**
`import data_designer.config as dd` is only referenced inside the commented-out block (for `dd.ModelProvider`). As long as that block remains commented out, `dd` is never used and this import is dead code.
```suggestion
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into vllm_example" | Re-trigger Greptile
| registry.register(JudgeSpec( | ||
| alias="my-judge", | ||
| model_id="anthropic/claude-sonnet-4-20250514", | ||
| scores=[...], | ||
| prompt_template="...", |
There was a problem hiding this comment.
Docstring example contradicts usage rules on two counts
The docstring example at the top of the file violates the rules established elsewhere:
-
alias="my-judge"uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases must use underscores, because hyphens break Jinja2 template rendering (e.g.,{{ response_my-judge }}is parsed as{{ response_my }} - judge). This example will silently produce broken column references at runtime. -
model_id="anthropic/claude-sonnet-4-20250514"includes the provider prefix. However,JudgeSpec.to_model_config()already passesprovider=self.providertodd.ModelConfig, so including the prefix inmodel_idwould result in a doubly-prefixed LiteLLM call (anthropic/anthropic/claude-sonnet-4-20250514), causing an API routing failure. The actual registered default (judge_claude) correctly usesmodel_id="claude-sonnet-4-20250514"without a prefix.
| registry.register(JudgeSpec( | |
| alias="my-judge", | |
| model_id="anthropic/claude-sonnet-4-20250514", | |
| scores=[...], | |
| prompt_template="...", | |
| registry = JudgeRegistry() | |
| registry.register(JudgeSpec( | |
| alias="my_judge", | |
| model_id="claude-sonnet-4-20250514", | |
| scores=[...], | |
| prompt_template="...", | |
| )) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 9-13
Comment:
**Docstring example contradicts usage rules on two counts**
The docstring example at the top of the file violates the rules established elsewhere:
1. `alias="my-judge"` uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases **must use underscores**, because hyphens break Jinja2 template rendering (e.g., `{{ response_my-judge }}` is parsed as `{{ response_my }} - judge`). This example will silently produce broken column references at runtime.
2. `model_id="anthropic/claude-sonnet-4-20250514"` includes the provider prefix. However, `JudgeSpec.to_model_config()` already passes `provider=self.provider` to `dd.ModelConfig`, so including the prefix in `model_id` would result in a doubly-prefixed LiteLLM call (`anthropic/anthropic/claude-sonnet-4-20250514`), causing an API routing failure. The actual registered default (`judge_claude`) correctly uses `model_id="claude-sonnet-4-20250514"` without a prefix.
```suggestion
registry = JudgeRegistry()
registry.register(JudgeSpec(
alias="my_judge",
model_id="claude-sonnet-4-20250514",
scores=[...],
prompt_template="...",
))
```
How can I resolve this? If you propose a fix, please make it concise.| def build_prompt(self, model_aliases: list[str]) -> str: | ||
| """Build the judge prompt with response column references. | ||
|
|
||
| Args: | ||
| model_aliases: List of model aliases whose response columns | ||
| will be referenced in the prompt via Jinja2 {{ }} syntax. | ||
|
|
||
| Returns: | ||
| Prompt string with Jinja2 column references. | ||
| """ | ||
| lines = [] | ||
| for alias in model_aliases: | ||
| col_name = f"response_{alias}" | ||
| lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}") | ||
| responses_block = "\n\n".join(lines) | ||
| return self.prompt_template.format(responses_block=responses_block) |
There was a problem hiding this comment.
str.format() raises KeyError for custom templates containing curly braces
self.prompt_template.format(responses_block=responses_block) will raise a KeyError for any custom prompt_template that contains {...} placeholders other than {responses_block} — for example a template that includes a JSON snippet like {"key": "value"} or a named slot like {task_description}. This is a common pattern in judge prompts.
The fix is to use a single .replace() call, which is immune to stray braces:
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 117-132
Comment:
**`str.format()` raises `KeyError` for custom templates containing curly braces**
`self.prompt_template.format(responses_block=responses_block)` will raise a `KeyError` for any custom `prompt_template` that contains `{...}` placeholders other than `{responses_block}` — for example a template that includes a JSON snippet like `{"key": "value"}` or a named slot like `{task_description}`. This is a common pattern in judge prompts.
The fix is to use a single `.replace()` call, which is immune to stray braces:
```python
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)
```
How can I resolve this? If you propose a fix, please make it concise.| # Apply the patch once at import time. | ||
| ImageContext._format_base64_context = _patched_format_base64_context # type: ignore[assignment] |
There was a problem hiding this comment.
Global monkey-patch applied at module import time
ImageContext._format_base64_context is patched unconditionally the moment elorian.pipeline is imported. Any other code in the same process that imports data_designer.config.models.ImageContext — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside _get_data_designer()) or, better, contributing the fix upstream and removing the patch entirely once it lands.
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 51-52
Comment:
**Global monkey-patch applied at module import time**
`ImageContext._format_base64_context` is patched unconditionally the moment `elorian.pipeline` is imported. Any other code in the same process that imports `data_designer.config.models.ImageContext` — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside `_get_data_designer()`) or, better, contributing the fix upstream and removing the patch entirely once it lands.
How can I resolve this? If you propose a fix, please make it concise.| def _get_data_designer(self) -> DataDesigner: | ||
| """Create a DataDesigner interface instance with the required providers.""" | ||
| # Use LiteLLM bridge so provider_type is used as the routing prefix. | ||
| os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge" | ||
|
|
||
| return DataDesigner( | ||
| artifact_path=self.artifact_path, | ||
| model_providers=self._collect_providers(), | ||
| ) |
There was a problem hiding this comment.
os.environ mutation is a permanent, process-wide side effect
Setting os.environ["DATA_DESIGNER_MODEL_BACKEND"] inside _get_data_designer() permanently overrides the environment for the entire process — including any other code that creates a DataDesigner instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the DataDesigner constructor instead, or at least restoring the original value:
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prevPrompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 143-151
Comment:
**`os.environ` mutation is a permanent, process-wide side effect**
Setting `os.environ["DATA_DESIGNER_MODEL_BACKEND"]` inside `_get_data_designer()` permanently overrides the environment for the entire process — including any other code that creates a `DataDesigner` instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the `DataDesigner` constructor instead, or at least restoring the original value:
```python
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prev
```
How can I resolve this? If you propose a fix, please make it concise.| lines = ["Multimodal Eval Pipeline Configuration", "=" * 40] | ||
| lines.append(f"\nImage column: {self.image_column}") | ||
| lines.append(f"Prompt: {self.prompt[:80]}...") |
There was a problem hiding this comment.
... is always appended even when the prompt is shorter than 80 characters
self.prompt[:80] + "..." unconditionally appends the ellipsis. For a short custom prompt (e.g., "Describe this image."), the output reads Prompt: Describe this image.... which is misleading.
| lines = ["Multimodal Eval Pipeline Configuration", "=" * 40] | |
| lines.append(f"\nImage column: {self.image_column}") | |
| lines.append(f"Prompt: {self.prompt[:80]}...") | |
| lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 192-194
Comment:
**`...` is always appended even when the prompt is shorter than 80 characters**
`self.prompt[:80] + "..."` unconditionally appends the ellipsis. For a short custom prompt (e.g., `"Describe this image."`), the output reads `Prompt: Describe this image....` which is misleading.
```suggestion
lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}")
```
How can I resolve this? If you propose a fix, please make it concise.| """Utilities for loading and processing images into DataDesigner seed datasets. | ||
|
|
||
| Supports loading from: | ||
| - A local directory of image files | ||
| - A HuggingFace dataset with an image column | ||
| - A list of PIL images | ||
| - A list of image URLs | ||
| """ |
There was a problem hiding this comment.
Module docstring lists URL loading as supported, but no such function exists
The module docstring lists - A list of image URLs as a supported source, but there is no load_images_from_urls() function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
| """Utilities for loading and processing images into DataDesigner seed datasets. | |
| Supports loading from: | |
| - A local directory of image files | |
| - A HuggingFace dataset with an image column | |
| - A list of PIL images | |
| - A list of image URLs | |
| """ | |
| """Utilities for loading and processing images into DataDesigner seed datasets. | |
| Supports loading from: | |
| - A local directory of image files | |
| - A HuggingFace dataset with an image column | |
| - A list of PIL images | |
| """ |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/image_utils.py
Line: 1-8
Comment:
**Module docstring lists URL loading as supported, but no such function exists**
The module docstring lists `- A list of image URLs` as a supported source, but there is no `load_images_from_urls()` function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
```suggestion
"""Utilities for loading and processing images into DataDesigner seed datasets.
Supports loading from:
- A local directory of image files
- A HuggingFace dataset with an image column
- A list of PIL images
"""
```
How can I resolve this? If you propose a fix, please make it concise.| # judge_registry.register( | ||
| # JudgeSpec( | ||
| # alias="judge-gpt4o", | ||
| # model_id="openai/gpt-4o", | ||
| # description="GPT-4o as LLM judge", | ||
| # ) | ||
| # ) |
There was a problem hiding this comment.
Commented example has a hyphenated alias and a doubly-prefixed
model_id
If a user uncomments this block as-is, two bugs will surface:
alias="judge-gpt4o"— the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference ({{ eval_judge-gpt4o }}).model_id="openai/gpt-4o"— this includes the provider prefix. BecauseJudgeSpec.to_model_config()also setsprovider="openai"(the default), LiteLLM will receiveopenai/gpt-4owithprovider=openai, which may be interpreted asopenai/openai/gpt-4odepending on routing, causing an API error.
| # judge_registry.register( | |
| # JudgeSpec( | |
| # alias="judge-gpt4o", | |
| # model_id="openai/gpt-4o", | |
| # description="GPT-4o as LLM judge", | |
| # ) | |
| # ) | |
| # judge_registry.register( | |
| # JudgeSpec( | |
| # alias="judge_gpt4o", | |
| # model_id="gpt-4o", | |
| # provider="openai", | |
| # description="GPT-4o as LLM judge", | |
| # ) | |
| # ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 60-66
Comment:
**Commented example has a hyphenated alias and a doubly-prefixed `model_id`**
If a user uncomments this block as-is, two bugs will surface:
1. `alias="judge-gpt4o"` — the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference (`{{ eval_judge-gpt4o }}`).
2. `model_id="openai/gpt-4o"` — this includes the provider prefix. Because `JudgeSpec.to_model_config()` also sets `provider="openai"` (the default), LiteLLM will receive `openai/gpt-4o` with `provider=openai`, which may be interpreted as `openai/openai/gpt-4o` depending on routing, causing an API error.
```suggestion
# judge_registry.register(
# JudgeSpec(
# alias="judge_gpt4o",
# model_id="gpt-4o",
# provider="openai",
# description="GPT-4o as LLM judge",
# )
# )
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| from __future__ import annotations | ||
|
|
||
| import data_designer.config as dd |
There was a problem hiding this comment.
import data_designer.config as dd is only referenced inside the commented-out block (for dd.ModelProvider). As long as that block remains commented out, dd is never used and this import is dead code.
| import data_designer.config as dd |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 13
Comment:
**Unused import**
`import data_designer.config as dd` is only referenced inside the commented-out block (for `dd.ModelProvider`). As long as that block remains commented out, `dd` is never used and this import is dead code.
```suggestion
```
How can I resolve this? If you propose a fix, please make it concise.|
Closing this PR. @ZhenXu587 if you have a bug or issue to raise, please feel free to submit one in Issues. Presently, this PR adds an entirely new project into the repo. It seems more appropriate to boot up your own project and utilize NDD as a dependency than injecting the project into NDD. |
No description provided.