[AIG]Evaluations and Logging (#17134)

daisyfaithauma · hyperlint-ai[bot] · patriciasantaana · commit 1919631eea30 · 2024-09-26T10:53:00.000-07:00
* Evaluations and Logging

* Update src/content/docs/ai-gateway/observability/evaluations/set-up-evaluations.mdx

Co-authored-by: hyperlint-ai[bot] &lt;154288675+hyperlint-ai[bot]@users.noreply.github.com&gt;

* Links

---------

Co-authored-by: hyperlint-ai[bot] &lt;154288675+hyperlint-ai[bot]@users.noreply.github.com&gt;
diff --git a/src/content/docs/ai-gateway/observability/analytics.mdx b/src/content/docs/ai-gateway/observability/analytics.mdx
@@ -1,5 +1,5 @@
 ---
-title: Analytics and logging
+title: Analytics
 pcx_content_type: reference
 ---
 
@@ -42,12 +42,3 @@ curl https://api.cloudflare.com/client/v4/graphql \
 ```
 
 </TabItem> </Tabs>
-
-:::note[Note]
-
-The cost metric is an estimation based on the number of tokens sent and received in requests. While this metric can help you monitor and predict cost trends, refer to your provider’s dashboard for the most accurate cost details.
-:::
-
-## Logging
-
-Your AI Gateway dashboard also shows real-time logs of individual requests, such as the prompt, response, provider, timestamps, and whether the request was successful, cached, or if there was an error. These logs now persist and can store up to 10,000 logs per gateway for better observability and analysis.
diff --git a/src/content/docs/ai-gateway/observability/costs.mdx b/src/content/docs/ai-gateway/observability/costs.mdx
@@ -1,9 +1,10 @@
 ---
 title: Costs
 pcx_content_type: reference
+sidebar:
+  order: 2
 ---
 
-
 ## Supported Providers
 
 AI Gateway currently supports cost metrics from the following providers:
@@ -28,7 +29,6 @@ The cost metric is an **estimation** based on the number of tokens sent and rece
 
 :::caution[Caution]
 
-
 Providers may introduce new models or change their pricing. If you notice outdated cost data or are using a model not yet supported by our cost tracking, please [submit a request](https://forms.gle/8kRa73wRnvq7bxL48)
 
 :::
@@ -37,6 +37,3 @@ Providers may introduce new models or change their pricing. If you notice outdat
 
 AI Gateway allows users to set custom costs when operating under special pricing agreements or negotiated rates. Custom costs can be applied at the request level, and when applied, they will override the default or public model costs.
 For more information on configuration of custom costs, please visit the [Custom Costs](/ai-gateway/configuration/custom-costs/) configuration page.
-
-
-
diff --git a/src/content/docs/ai-gateway/observability/evaluations/index.mdx b/src/content/docs/ai-gateway/observability/evaluations/index.mdx
@@ -0,0 +1,15 @@
+---
+title: Evaluations
+pcx_content_type: navigation
+order: 1
+---
+
+Understanding your application's performance is essential for optimization. Developers often have different priorities, and finding the optimal solution involves balancing key factors such as cost, latency, and accuracy. Some prioritize low-latency responses, while others focus on accuracy or cost-efficiency.
+
+AI Gateway's Evaluations provide the data needed to make informed decisions on how to optimize your AI application. Whether it's adjusting the model, provider, or prompt, this feature delivers insights into key metrics around performance, speed, and cost. It empowers developers to better understand their application's behavior, ensuring improved accuracy, reliability, and customer satisfaction.
+
+Evaluations use datasets which are collections of logs stored for analysis. You can create datasets by applying filters in the Logs tab, which help narrow down specific logs for evaluation.
+
+Our first step toward comprehensive AI evaluations starts with human feedback (currently in open beta). We will continue to build and expand AI Gateway with additional evaluators.
+
+[Learn how to set up an evaluation](/ai-gateway/observability/evaluations/set-up-evaluations/) including creating datasets, selecting evaluators, and running the evaluation process.
diff --git a/src/content/docs/ai-gateway/observability/evaluations/set-up-evaluations.mdx b/src/content/docs/ai-gateway/observability/evaluations/set-up-evaluations.mdx
@@ -0,0 +1,58 @@
+---
+pcx_content_type: how-to
+title: Set up Evaluations
+sidebar:
+  order: 2
+---
+
+This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the [Cloudflare dashboard](https://dash.cloudflare.com/).
+
+## 1. Select or create a dataset
+
+Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets by applying filters in the Logs tab. Datasets will update automatically based on the set filters.
+
+### Set up a dataset from the Logs tab
+
+1. Apply filters to narrow down your logs. Filter options include provider, number of tokens, request status, and more.
+2. Select **Create Dataset** to store the filtered logs for future analysis.
+
+You can manage datasets by selecting **Manage datasets** from the Logs tab.
+
+:::note[Note]
+
+Please keep in mind that datasets currently use `AND` joins, so there can only be one item per filter (for example, one model or one provider). Future updates will allow more flexibility in dataset creation.
+
+:::
+
+## 2. Select evaluators
+
+After creating a dataset, choose the evaluation parameters:
+
+- Cost: Calculates the average cost of inference requests within the dataset (only for requests with [cost data](/ai-gateway/observability/costs/)).
+- Speed: Calculates the average duration of inference requests within the dataset.
+- Performance:
+  - Human feedback: measures performance based on human feedback, calculated by the % of thumbs up on the logs, annotated from the Logs tab.
+
+:::note[Note]
+
+Additional evaluators will be introduced in future updates to expand performance analysis capabilities.
+
+:::
+
+## 3. Name, review, and run the evaluation
+
+1. Create a unique name for your evaluation to reference it in the dashboard.
+2. Review the selected dataset and evaluators.
+3. Select **Run** to start the process.
+
+## 4. Review and analyze results
+
+Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (for example, in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.
+
+While datasets automatically update based on filters, evaluations do not. You will have to create a new evaluation if you want to evaluate new logs.
+
+Use these insights to optimize based on your application's priorities. Based on the results, you may choose to:
+
+- Change the model or [provider](/ai-gateway/providers/)
+- Adjust your prompts
+- Explore further optimizations, such as setting up [Retrieval Augmented Generation (RAG)](/reference-architecture/diagrams/ai/ai-rag/)
diff --git a/src/content/docs/ai-gateway/observability/logging/index.mdx b/src/content/docs/ai-gateway/observability/logging/index.mdx
@@ -0,0 +1,56 @@
+---
+pcx_content_type: reference
+title: Logging
+sidebar:
+  badge:
+    text: Beta
+---
+
+import { Render } from "~/components";
+
+Logging is a fundamental building block for application development. Logs provide insights during the early stages of development and are often critical to understanding issues occurring in production.
+
+Your AI Gateway dashboard shows logs of individual requests, including the user prompt, model response, provider, timestamp, request status, token usage, cost, and duration. These logs persist, giving you the flexibility to store them for your preferred duration and do more with valuable request data.
+
+You can store up to 10 million logs per gateway. If your limit is reached, new logs will stop being saved. To continue saving logs, you must delete older logs to free up space for new logs.
+
+To learn more about your plan limits, refer to [Pricing](/ai-gateway/pricing/).
+
+## Default configuration
+
+Logs, which include metrics as well as request and response data, are enabled by default for each gateway. This logging behavior will be uniformly applied to all requests in the gateway. If you are concerned about privacy or compliance and want to turn log collection off, you can go to settings and opt out of logs. If you need to modify the log settings for specific requests, you can override this setting on a per-request basis.
+
+<Render file="logging" />
+
+:::note
+
+To export logs using [Logpush](/ai-gateway/observability/logging/logpush), you must have logs turned on for the gateway.
+
+:::
+
+## Per-request logging
+
+To override the default logging behavior set in the settings tab, you can define headers on a per-request basis.
+
+## Collect logs (`cf-aig-collect-log`)
+
+The `cf-aig-collect-log` header allows you to bypass the default log setting for the gateway. If the gateway is configured to save logs, the header will exclude the log for that specific request. Conversely, if logging is disabled at the gateway level, this header will save the log for that request.
+
+In the example below, we use `cf-aig-collect-log` to bypass the default setting to avoid saving the log.
+
+```bash
+curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
+  --header 'Authorization: Bearer $TOKEN' \
+  --header 'Content-Type: application/json' \
+  --header 'cf-aig-collect-log: false \
+  --data ' {
+        "model": "gpt-4o-mini",
+        "messages": [
+          {
+            "role": "user",
+            "content": "What is the email address and phone number of user123?"
+          }
+        ]
+      }
+'
+```
diff --git a/src/content/docs/ai-gateway/observability/logging/logpush.mdx b/src/content/docs/ai-gateway/observability/logging/logpush.mdx
diff --git a/src/content/partials/ai-gateway/logging.mdx b/src/content/partials/ai-gateway/logging.mdx
@@ -0,0 +1,10 @@
+---
+{}
+---
+
+To change the default log configuration in the dashboard:
+
+1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account.
+2. Go to **AI** > **AI Gateway**.
+3. Select **Settings**.
+4. Change the **Logs** setting to your preference.