Skip to content

Commit 926ca16

Browse files
authored
feat: add llm_factory and embedding_factory (#2112)
This PR introduces `llm_factory` and `embedding_factory` to provide a unified interface for creating LLM and embedding instances across multiple providers. ### What's new - `llm_factory` for creating LLM instances - `embedding_factory` for creating embedding instances - Support for OpenAI, Google, and LiteLLM providers - Consistent sync/async interface across all providers ### Usage ```python from ragas_experimental import llm_factory, embedding_factory from litellm import acompletion, completion, embedding, aembedding from openai import OpenAI, AsyncOpenAI from pydantic import BaseModel # Create LLM instance llm = llm_factory("litellm/openai/gpt-4o", client=completion) # use acompletion if you want async llm = llm_factory(openai/gpt-4o", client=OpenAI) # use AsyncOpenAI if you want async # Generate with structured output class HelloWorld(BaseModel): text: str llm.generate("hai", HelloWorld) # Returns: HelloWorld(text='Hello! How can I assist you today?') # Create embedding instance emb = embedding_factory("litellm/openai/text-embedding-3-small", client=embedding) emb = embedding_factory("litellm/openai/text-embedding-3-small", client=OpenAI) # Async embedding await emb.aembed_text("hello") # Returns: 1536-dimensional vector ``` ### Benefits - Seamless switching between providers - Consistent API for both LLMs and embeddings - Built-in support for structured outputs with Pydantic - Full async support This provides a clean abstraction layer for working with different AI providers in the ragas experimental framework.
1 parent 60b9e7c commit 926ca16

File tree

26 files changed

+1578
-131
lines changed

26 files changed

+1578
-131
lines changed

.github/workflows/claude-code.yaml

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Claude Code Assistant
1+
name: Claude PR Assistant
22

33
on:
44
issue_comment:
@@ -10,21 +10,29 @@ on:
1010
pull_request_review:
1111
types: [submitted]
1212

13-
permissions:
14-
contents: write
15-
issues: write
16-
pull-requests: write
17-
id-token: write
18-
1913
jobs:
20-
claude-response:
21-
name: Claude Code Response
14+
claude-code-action:
15+
if: |
16+
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
17+
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
18+
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
19+
(github.event_name == 'issues' && contains(github.event.issue.body, '@claude'))
2220
runs-on: ubuntu-latest
23-
timeout-minutes: 30
24-
if: contains(github.event.comment.body, '@claude') || github.event_name == 'issues' || github.event_name == 'pull_request_review'
21+
permissions:
22+
contents: read
23+
pull-requests: read
24+
issues: read
25+
id-token: write
2526
steps:
26-
- name: Claude Code Action
27-
uses: anthropics/claude-code-action@v1
27+
- name: Checkout repository
28+
uses: actions/checkout@v4
29+
with:
30+
fetch-depth: 1
31+
32+
- name: Run Claude PR Action
33+
uses: anthropics/claude-code-action@beta
2834
with:
2935
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
30-
github_token: ${{ secrets.GITHUB_TOKEN }}
36+
# Or use OAuth token instead:
37+
# claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
38+
timeout_minutes: "60"

docs/experimental/tutorials/agent.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ df.to_csv("datasets/test_dataset.csv", index=False)
4141
To evaluate the performance of our agent, we will define a non llm metric that compares if our agent's output is within a certain tolerance of the expected output and outputs 1/0 based on it.
4242

4343
```python
44-
from ragas_experimental.metric import numeric_metric
45-
from ragas_experimental.metric.result import MetricResult
44+
from ragas_experimental.metrics import numeric_metric
45+
from ragas_experimental.metrics.result import MetricResult
4646

4747
@numeric_metric(name="correctness")
4848
def correctness_metric(prediction: float, actual: float):

docs/experimental/tutorials/prompt.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ pd.DataFrame(samples).to_csv("datasets/test_dataset.csv", index=False)
3030
Now we need to have a way to measure the performance of our prompt in this task. We will define a metric that will compare the output of our prompt with the expected output and outputs pass/fail based on it.
3131

3232
```python
33-
from ragas_experimental.metric import discrete_metric
34-
from ragas_experimental.metric.result import MetricResult
33+
from ragas_experimental.metrics import discrete_metric
34+
from ragas_experimental.metrics.result import MetricResult
3535

3636
@discrete_metric(name="accuracy", values=["pass", "fail"])
3737
def my_metric(prediction: str, actual: str):

docs/experimental/tutorials/rag.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ pd.DataFrame(samples).to_csv("datasets/test_dataset.csv", index=False)
3737
To evaluate the performance of our RAG system, we will define a llm based metric that compares the output of our RAG system with the grading notes and outputs pass/fail based on it.
3838

3939
```python
40-
from ragas_experimental.metric import DiscreteMetric
40+
from ragas_experimental.metrics import DiscreteMetric
4141
my_metric = DiscreteMetric(
4242
name="correctness",
4343
prompt = "Check if the response contains points mentioned from the grading notes and return 'pass' or 'fail'.\nResponse: {response} Grading Notes: {grading_notes}",

docs/experimental/tutorials/workflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ pd.DataFrame(dataset_dict).to_csv("datasets/test_dataset.csv", index=False)
3737
To evaluate the performance of our workflow, we will define a llm based metric that compares the output of our workflow with the pass criteria and outputs pass/fail based on it.
3838

3939
```python
40-
from ragas_experimental.metric import DiscreteMetric
40+
from ragas_experimental.metrics import DiscreteMetric
4141

4242
my_metric = DiscreteMetric(
4343
name="response_quality",

experimental/ragas_examples/agent_evals/evals.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from ragas_experimental import Dataset, experiment
2-
from ragas_experimental.metric.numeric import numeric_metric
3-
from ragas_experimental.metric.result import MetricResult
2+
from ragas_experimental.metrics.numeric import numeric_metric
3+
from ragas_experimental.metrics.result import MetricResult
44
from .agent import get_default_agent
55

66
math_agent = get_default_agent()

experimental/ragas_examples/prompt_evals/evals.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from ragas_experimental import Dataset, experiment
2-
from ragas_experimental.metric.result import MetricResult
3-
from ragas_experimental.metric.discrete import discrete_metric
2+
from ragas_experimental.metrics.result import MetricResult
3+
from ragas_experimental.metrics.discrete import discrete_metric
44

55
from .prompt import run_prompt
66

experimental/ragas_examples/rag_eval/evals.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
from ragas_experimental import Dataset, experiment
2-
from ragas_experimental.metric import DiscreteMetric
2+
from ragas_experimental.metrics import DiscreteMetric
33
from openai import OpenAI
4-
from ragas_experimental.llms import ragas_llm
4+
from ragas_experimental.llms import llm_factory
55
import os
66
from .rag import default_rag_client
77

88
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
99
rag_client = default_rag_client(llm_client=openai_client)
10-
llm = ragas_llm("openai","gpt-4o", openai_client)
10+
llm = llm_factory("openai","gpt-4o", openai_client)
1111

1212
def load_dataset():
1313

experimental/ragas_examples/workflow_eval/evals.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
import os
22
from openai import OpenAI
33
from ragas_experimental import Dataset, experiment
4-
from ragas_experimental.metric import DiscreteMetric
5-
from ragas_experimental.llms import ragas_llm
4+
from ragas_experimental.metrics import DiscreteMetric
5+
from ragas_experimental.llms import llm_factory
66
from .workflow import default_workflow_client
77

88

99
workflow_client = default_workflow_client()
10-
llm = ragas_llm("openai", "gpt-4o", OpenAI(api_key=os.environ.get("OPENAI_API_KEY")))
10+
llm = llm_factory("openai", "gpt-4o", OpenAI(api_key=os.environ.get("OPENAI_API_KEY")))
1111

1212

1313

experimental/ragas_experimental/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,7 @@
1313

1414
from ragas_experimental.dataset import Dataset
1515
from ragas_experimental.experiment import experiment, Experiment
16+
from ragas_experimental.llms import llm_factory
17+
from ragas_experimental.embeddings import embedding_factory
1618

17-
__all__ = ["Dataset", "experiment", "Experiment"]
19+
__all__ = ["Dataset", "experiment", "Experiment", "llm_factory", "embedding_factory"]

0 commit comments

Comments
 (0)