explodinggradients
diff --git a/‎docs/howtos/integrations/index.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/howtos/integrations/index.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/howtos/integrations/llama_stack.md‎
Lines changed: 359 additions & 0 deletions b/‎docs/howtos/integrations/llama_stack.md‎
Lines changed: 359 additions & 0 deletions
diff --git a/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions b/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/ragas/embeddings/base.py‎
Lines changed: 1 addition & 1 deletion b/‎src/ragas/embeddings/base.py‎
Lines changed: 1 addition & 1 deletion
@@ -9,9 +9,10 @@ happy to look into it 🙂
 ## Frameworks
 
 - [Amazon Bedrock](./amazon_bedrock.md) - Amazon Bedrock is a managed framework for building, deploying, and scaling intelligent agents and integrated AI solutions; more information can be found [here](https://aws.amazon.com/bedrock/).
+- [Haystack](./haystack.md) - Haystack is a LLM orchestration framework to build customizable, production-ready LLM applications, more information can be found [here](https://haystack.deepset.ai/).
 - [Langchain](./langchain.md) - Langchain is a framework for building LLM applications, more information can be found [here](https://www.langchain.com/).
 - [LlamaIndex](./_llamaindex.md) - LlamaIndex is a framework for building RAG applications, more information can be found [here](https://www.llamaindex.ai/).
-- [Haystack](./haystack.md) - Haystack is a LLM orchestration framework to build customizable, production-ready LLM applications, more information can be found [here](https://haystack.deepset.ai/).
+- [LlamaStack](./llama_stack.md) – A unified framework by Meta for building and deploying generative AI apps across local, cloud, and mobile; [docs](https://llama-stack.readthedocs.io/en/latest/)
 - [R2R](./r2r.md) - R2R is an all-in-one solution for AI Retrieval-Augmented Generation (RAG) with production-ready features, more information can be found [here](https://r2r-docs.sciphi.ai/introduction)
 - [Swarm](./swarm_agent_evaluation.md) - Swarm is a framework for orchestrating multiple AI agents, more information can be found [here](https://github.com/openai/swarm).
 
 
@@ -0,0 +1,359 @@
+# Evaluating LlamaStack Web Search Groundedness with Llama 4
+
+In this tutorial we will measure the groundedness of response generated by the LlamaStack's web search agent. [LlamaStack](https://llama-stack.readthedocs.io/en/latest/) is an open-source framework maintained by meta, that streamlines the development and deployment of large language model-powered applications. The evaluations will be done using the Ragas metrics and using Meta Llama 4 Maverick as the judge.
+
+## Setup and Running a LlamaStack server
+
+This command installs all the dependencies needed for the LlamaStack server with the together inference provider
+
+Use the command with conda
+```shell
+!pip install ragas langchain-together uv 
+!uv run --with llama-stack llama stack build --template together --image-type conda
+```
+
+Use the command with venv
+```shell
+!pip install ragas langchain-together uv 
+!uv run --with llama-stack llama stack build --template together --image-type venv
+```
+
+
+```python
+import os
+import subprocess
+
+
+def run_llama_stack_server_background():
+    log_file = open("llama_stack_server.log", "w")
+    process = subprocess.Popen(
+        "uv run --with llama-stack llama stack run together --image-type venv",
+        shell=True,
+        stdout=log_file,
+        stderr=log_file,
+        text=True,
+    )
+
+    print(f"Starting LlamaStack server with PID: {process.pid}")
+    return process
+
+
+def wait_for_server_to_start():
+    import requests
+    from requests.exceptions import ConnectionError
+    import time
+
+    url = "http://0.0.0.0:8321/v1/health"
+    max_retries = 30
+    retry_interval = 1
+
+    print("Waiting for server to start", end="")
+    for _ in range(max_retries):
+        try:
+            response = requests.get(url)
+            if response.status_code == 200:
+                print("\nServer is ready!")
+                return True
+        except ConnectionError:
+            print(".", end="", flush=True)
+            time.sleep(retry_interval)
+
+    print("\nServer failed to start after", max_retries * retry_interval, "seconds")
+    return False
+
+
+# use this helper if needed to kill the server
+def kill_llama_stack_server():
+    # Kill any existing llama stack server processes
+    os.system(
+        "ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9"
+    )
+```
+
+## Starting the LlamaStack Server
+
+
+```python
+server_process = run_llama_stack_server_background()
+assert wait_for_server_to_start()
+```
+```
+Starting LlamaStack server with PID: 95508
+Waiting for server to start....
+Server is ready!
+```
+
+
+## Building a Search Agent
+
+
+```python
+from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
+
+client = LlamaStackClient(
+    base_url="http://0.0.0.0:8321",
+)
+
+agent = Agent(
+    client,
+    model="meta-llama/Llama-3.1-8B-Instruct",
+    instructions="You are a helpful assistant. Use web search tool to answer the questions.",
+    tools=["builtin::websearch"],
+)
+user_prompts = [
+    "In which major did Demis Hassabis complete his undergraduate degree? Search the web for the answer.",
+    "Ilya Sutskever is one of the key figures in AI. From which institution did he earn his PhD in machine learning? Search the web for the answer.",
+    "Sam Altman, widely known for his role at OpenAI, was born in which American city? Search the web for the answer.",
+]
+
+session_id = agent.create_session("test-session")
+
+
+for prompt in user_prompts:
+    response = agent.create_turn(
+        messages=[
+            {
+                "role": "user",
+                "content": prompt,
+            }
+        ],
+        session_id=session_id,
+    )
+    for log in AgentEventLogger().log(response):
+        log.print()
+```
+
+Now, let’s look deeper into the agent’s execution steps and see if how well our agent performs.
+
+
+```python
+session_response = client.agents.session.retrieve(
+    session_id=session_id,
+    agent_id=agent.agent_id,
+)
+```
+
+## Evaluate Agent Responses
+
+We want to measure the Groundedness of response generated by the LlamaStack web search Agent. To do this we will need [EvaluationDataset](../../concepts/components/eval_dataset.md) and metrics to assess the grounded response, Ragas provides a wide array of off the shelf metrics that can be used to measure various aspects of retrieval and generations. 
+
+For measuring groundedness of response we will use:- 
+
+1. [Faithfulness](../../concepts/metrics/available_metrics/faithfulness.md)
+2. [Response Groundedness](../../concepts/metrics/available_metrics/nvidia_metrics.md#response-groundedness)
+
+### Constructing a Ragas EvaluationDataset
+
+To perform evaluations using Ragas we will create a `EvaluationDataset`
+
+
+```python
+import json
+
+# This function extracts the search results for the trace of each query
+def extract_retrieved_contexts(turn_object):
+    results = []
+    for step in turn_object.steps:
+        if step.step_type == "tool_execution":
+            tool_responses = step.tool_responses
+            for response in tool_responses:
+                content = response.content
+                if content:
+                    try:
+                        parsed_result = json.loads(content)
+                        results.append(parsed_result)
+                    except json.JSONDecodeError:
+                        print("Warning: Unable to parse tool response content as JSON.")
+                        continue
+
+    retrieved_context = []
+    for result in results:
+        top_content_list = [item["content"] for item in result["top_k"]]
+        retrieved_context.extend(top_content_list)
+    return retrieved_context
+```
+
+
+```python
+from ragas.dataset_schema import EvaluationDataset
+
+samples = []
+
+references = [
+    "Demis Hassabis completed his undergraduate degree in Computer Science.",
+    "Ilya Sutskever earned his PhD from the University of Toronto.",
+    "Sam Altman was born in Chicago, Illinois.",
+]
+
+for i, turn in enumerate(session_response.turns):
+    samples.append(
+        {
+            "user_input": turn.input_messages[0].content,
+            "response": turn.output_message.content,
+            "reference": references[i],
+            "retrieved_contexts": extract_retrieved_contexts(turn),
+        }
+    )
+
+ragas_eval_dataset = EvaluationDataset.from_list(samples)
+```
+
+
+```python
+ragas_eval_dataset.to_pandas()
+```
+
+
+<div>
+<style scoped>
+    .dataframe tbody tr th:only-of-type {
+        vertical-align: middle;
+    }
+
+    .dataframe tbody tr th {
+        vertical-align: top;
+    }
+
+    .dataframe thead th {
+        text-align: right;
+    }
+</style>
+<table border="1">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>user_input</th>
+      <th>retrieved_contexts</th>
+      <th>response</th>
+      <th>reference</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>In which major did Demis Hassabis complete his...</td>
+      <td>[Demis Hassabis holds a Bachelor's degree in C...</td>
+      <td>Demis Hassabis completed his undergraduate deg...</td>
+      <td>Demis Hassabis completed his undergraduate deg...</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>Ilya Sutskever is one of the key figures in AI...</td>
+      <td>[Jump to content Main menu Search Donate Creat...</td>
+      <td>Ilya Sutskever earned his PhD in machine learn...</td>
+      <td>Ilya Sutskever earned his PhD from the Univers...</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>Sam Altman, widely known for his role at OpenA...</td>
+      <td>[Sam Altman | Biography, OpenAI, Microsoft, &amp; ...</td>
+      <td>Sam Altman was born in Chicago, Illinois, USA.</td>
+      <td>Sam Altman was born in Chicago, Illinois.</td>
+    </tr>
+  </tbody>
+</table>
+</div>
+
+
+
+### Setting the Ragas Metrics
+
+
+```python
+from ragas.metrics import AnswerAccuracy, Faithfulness, ResponseGroundedness
+from langchain_together import ChatTogether
+from ragas.llms import LangchainLLMWrapper
+
+llm = ChatTogether(
+    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
+)
+evaluator_llm = LangchainLLMWrapper(llm)
+
+ragas_metrics = [
+    AnswerAccuracy(llm=evaluator_llm),
+    Faithfulness(llm=evaluator_llm),
+    ResponseGroundedness(llm=evaluator_llm),
+]
+```
+
+## Evaluation
+
+Finally, let's run the evaluation.
+
+
+```python
+from ragas import evaluate
+
+results = evaluate(dataset=ragas_eval_dataset, metrics=ragas_metrics)
+results.to_pandas()
+```
+```
+Evaluating: 100%|██████████| 9/9 [00:04<00:00,  2.03it/s]
+```
+
+<div>
+<style scoped>
+    .dataframe tbody tr th:only-of-type {
+        vertical-align: middle;
+    }
+
+    .dataframe tbody tr th {
+        vertical-align: top;
+    }
+
+    .dataframe thead th {
+        text-align: right;
+    }
+</style>
+<table border="1">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>user_input</th>
+      <th>retrieved_contexts</th>
+      <th>response</th>
+      <th>reference</th>
+      <th>nv_accuracy</th>
+      <th>faithfulness</th>
+      <th>nv_response_groundedness</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>In which major did Demis Hassabis complete his...</td>
+      <td>[Demis Hassabis holds a Bachelor's degree in C...</td>
+      <td>Demis Hassabis completed his undergraduate deg...</td>
+      <td>Demis Hassabis completed his undergraduate deg...</td>
+      <td>1.0</td>
+      <td>1.0</td>
+      <td>1.00</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>Ilya Sutskever is one of the key figures in AI...</td>
+      <td>[Jump to content Main menu Search Donate Creat...</td>
+      <td>Ilya Sutskever earned his PhD in machine learn...</td>
+      <td>Ilya Sutskever earned his PhD from the Univers...</td>
+      <td>1.0</td>
+      <td>0.5</td>
+      <td>0.75</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>Sam Altman, widely known for his role at OpenA...</td>
+      <td>[Sam Altman | Biography, OpenAI, Microsoft, &amp; ...</td>
+      <td>Sam Altman was born in Chicago, Illinois, USA.</td>
+      <td>Sam Altman was born in Chicago, Illinois.</td>
+      <td>1.0</td>
+      <td>1.0</td>
+      <td>1.00</td>
+    </tr>
+  </tbody>
+</table>
+</div>
+
+
+```python
+kill_llama_stack_server()
+```
@@ -116,6 +116,7 @@ nav:
           - LangGraph: howtos/integrations/_langgraph_agent_evaluation.md
           - LangSmith: howtos/integrations/langsmith.md
           - LlamaIndex: howtos/integrations/_llamaindex.md
+          - LlamaStack: howtos/integrations/llama_stack.md
           - R2R: howtos/integrations/r2r.md
           - Swarm: howtos/integrations/swarm_agent_evaluation.md
       - Migrations:
 
@@ -222,7 +222,7 @@ def __post_init__(self):
         super().__init__(cache=self.cache)
         try:
             import sentence_transformers
-            from transformers import AutoConfig
+            from transformers import AutoConfig  # type: ignore
             from transformers.models.auto.modeling_auto import (
                 MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES,
             )
Original file line number	Diff line number	Diff line change
`@@ -222,7 +222,7 @@ def __post_init__(self):`
`222`	`222`	`super().__init__(cache=self.cache)`
`223`	`223`	`try:`
`224`	`224`	`import sentence_transformers`
`225`		`- from transformers import AutoConfig`
	`225`	`+ from transformers import AutoConfig # type: ignore`
`226`	`226`	`from transformers.models.auto.modeling_auto import (`
`227`	`227`	`MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES,`
`228`	`228`	`)`