Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 45 additions & 4 deletions content/en/llm_observability/experiments/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,8 @@ List all projects, sorted by creation date. The most recently created projects a
| ---- | ---- | --- |
| `filter[id]` | string | The ID of a project to search for. |
| `filter[name]` | string | The name of a project to search for. |
| `filter[is_deleted]` | boolean | Filter for deleted projects. |
| `include[user_data]` | boolean | Include user data in the response. |
| `page[cursor]` | string | List results with a cursor provided in the previous query. |
| `page[limit]` | int | Limits the number of results. |

Expand Down Expand Up @@ -605,6 +607,7 @@ Create a project. If there is an existing project with the same name, the API re

| Field | Type | Description |
| ---- | ---- | ---- |
| `ml_app` | string | ML app identifier. |
| `name` (_required_) | string | Unique project name. |
| `description` | string | Project description. |

Expand All @@ -628,6 +631,7 @@ Partially update a project object. Specify the fields to update in the payload.

| Field | Type | Description |
| ---- | ---- | ---- |
| `ml_app` | string | ML app identifier. |
| `name` | string | Unique project name. |
| `description` | string | Project description. |

Expand All @@ -636,10 +640,13 @@ Partially update a project object. Specify the fields to update in the payload.
| Field | Type | Description |
| ---- | ---- | ---- |
| `id` | UUID | Unique project ID. Set at the top level `id` field within the [Data](#object-data) object. |
| `ml_app` | string | ML app identifier. |
| `name` | string | Unique project name. |
| `description` | string | Project description. |
| `created_at` | timestamp | Timestamp representing when the resource was created. |
| `updated_at` | timestamp | Timestamp representing when the resource was last updated. |
| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). |
| `author` | object | User who created the project. |

{{% /collapse-content %}}

Expand All @@ -651,7 +658,7 @@ Delete one or more projects.

| Field | Type | Description |
| ---- | ---- | ---- |
| `project_ids` (_required_) | []UUID | List of project IDs to delete. |
| `project_ids` (_required_) | array of strings | List of project IDs to delete. |

**Response**

Expand All @@ -673,6 +680,8 @@ List all datasets, sorted by creation date. The most recently-created datasets a
| ---- | ---- | --- |
| `filter[name]` | string | The name of a dataset to search for. |
| `filter[id]` | string | The ID of a dataset to search for. |
| `filter[is_deleted]` | boolean | Filter for deleted datasets. |
| `include[user_data]` | boolean | Include user data in the response. |
| `page[cursor]` | string | List results with a cursor provided in the previous query. |
| `page[limit]` | int | Limits the number of results. |

Expand All @@ -687,12 +696,16 @@ List all datasets, sorted by creation date. The most recently-created datasets a
| Field | Type | Description |
| ---- | ---- | ---- |
| `id` | string | Unique dataset ID. Set at the top level `id` field within the [Data](#object-data) object. |
| `project_id` | string | Unique project ID. |
| `name` | string | Unique dataset name. |
| `description` | string | Dataset description. |
| `metadata` | json | Arbitrary key-value metadata associated with the dataset. |
| `current_version` | int | The current version number of the dataset. Versions start at 0 and increment when records are added or modified. |
| `dataset_type` | string | Type of dataset. |
| `created_at` | timestamp | Timestamp representing when the resource was created. |
| `updated_at` | timestamp | Timestamp representing when the resource was last updated. |
| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). |
| `author` | object | User who created the dataset. |

{{% /collapse-content %}}

Expand All @@ -707,6 +720,7 @@ Create a dataset. If there is an existing dataset with the same name, the API re
| `name` (_required_) | string | Unique dataset name. |
| `description` | string | Dataset description. |
| `metadata` | json | Arbitrary key-value metadata associated with the dataset. |
| `dataset_type` | string | Type of dataset. |

**Response**

Expand Down Expand Up @@ -751,11 +765,18 @@ List all dataset records, sorted by creation date. The most recently-created rec
| ---- | ---- | ---- |
| `id` | string | Unique record ID. |
| `dataset_id` | string | Unique dataset ID. |
| `span_id` | string | Associated span ID. |
| `trace_id` | string | Associated trace ID. |
| `input` | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. |
| `expected_output` | any (string, number, Boolean, object, array) | Expected output. |
| `metadata` | json | Arbitrary key-value metadata associated with the record. |
| `created_at` | timestamp | Timestamp representing when the resource was created. |
| `updated_at` | timestamp | Timestamp representing when the resource was last updated. |
| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). |
| `ttl` | string | Time-to-live for the record. |
| `version` | int | Record version number. |
| `author` | object | User who created the record. |
| `_dd` | object | Internal Datadog attributes including content preview metadata. |

{{% /collapse-content %}}

Expand All @@ -769,12 +790,14 @@ Appends records for a given dataset.
| ---- | ---- | --- |
| `deduplicate` | bool | If `true`, deduplicates appended records. Defaults to `true`. |
| `records` (_required_) | [][RecordReq](#object-recordreq) | List of records to create. |
| `create_new_version` | bool | If `true`, creates a new dataset version. |

#### Object: RecordReq

| Field | Type | Description |
| ---- | ---- | ---- |
| `input` (_required_) | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. |
| `id` | string | Optional record ID. |
| `input` | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. |
| `expected_output` | any (string, number, Boolean, object, array) | Expected output. |
| `metadata` | json | Arbitrary key-value metadata associated with the record. |

Expand Down Expand Up @@ -902,12 +925,19 @@ List all experiments, sorted by creation date. The most recently-created experim
| `id` | UUID | Unique experiment ID. Set at the top level `id` field within the [Data](#object-data) object. |
| `project_id` | string | Unique project ID. |
| `dataset_id` | string | Unique dataset ID. |
| `dataset_version` | int | Dataset version number. |
| `dataset_name` | string | Dataset name. |
| `experiment` | string | Experiment identifier. |
| `name` | string | Unique experiment name. |
| `description` | string | Experiment description. |
| `metadata` | json | Arbitrary key-value metadata associated with the experiment. |
| `aggregate_data` | json | Aggregated experiment data. |
| `run_count` | int | Number of experiment runs. |
| `config` | json | Configuration used when creating the experiment. |
| `created_at` | timestamp | Timestamp representing when the resource was created. |
| `updated_at` | timestamp | Timestamp representing when the resource was last updated. |
| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). |
| `author` | object | User who created the experiment. |

{{% /collapse-content %}}

Expand All @@ -927,6 +957,7 @@ Create an experiment. If there is an existing experiment with the same name, the
| `ensure_unique` | bool | If `true`, Datadog generates a new experiment with a unique name in the case of a conflict. Default is `true`. |
| `metadata` | json | Arbitrary key-value metadata associated with the experiment. |
| `config` | json | Configuration used when creating the experiment. |
| `run_count` | int | Number of runs for the experiment. |

**Response**

Expand Down Expand Up @@ -954,6 +985,8 @@ Partially update an experiment object. Specify the fields to update in the paylo
| ---- | ---- | ---- |
| `name` | string | Unique experiment name. |
| `description` | string | Experiment description. |
| `dataset_id` | string | Unique dataset ID. |
| `metadata` | json | Arbitrary key-value metadata associated with the experiment. |

**Response**

Expand Down Expand Up @@ -1004,8 +1037,10 @@ Push events (spans and metrics) for an experiment.
| ---- | ---- | ---- |
| `trace_id` | string | Trace ID. |
| `span_id` | string | Span ID. |
| `parent_id` | string | Parent span ID. |
| `project_id` | string | Project ID. |
| `dataset_id` | string | Dataset ID. |
| `dataset_record_id` | string | Dataset record ID associated with this span. |
| `name` | string | Span name (for example, task name). |
| `start_ns` | number | Span start time in nanoseconds. |
| `duration` | number | Span duration in nanoseconds. |
Expand All @@ -1015,19 +1050,25 @@ Push events (spans and metrics) for an experiment.
| `meta.output` | json | Output payload associated with the span. |
| `meta.expected_output` | json | Expected output for the span. |
| `meta.error` | object | Error details: `message`, `stack`, `type`. |
| `meta.span` | object | Span-specific metadata (for example, `kind`). |
| `meta.metadata` | json | Arbitrary key-value metadata. |

#### Object: Metric

| Field | Type | Description |
| ---- | ---- | ---- |
| `id` | string | Metric ID (internally generated UUID). |
| `span_id` | string | Associated span ID. |
| `metric_type` | string | Metric type. One of: `score`, `categorical`. |
| `metric_type` | string | Metric type. One of: `score`, `categorical`, `boolean`. |
| `timestamp_ms` | number | UNIX timestamp in milliseconds. |
| `label` | string | Metric label (evaluator name). |
| `score_value` | number | Score value (when `metric_type` is `score`). |
| `categorical_value` | string | Categorical value (when `metric_type` is `categorical`). |
| `boolean_value` | boolean | Boolean value (when `metric_type` is `boolean`). |
| `metric_source` | string | Source of the metric (for example, `custom`, `summary`). |
| `eval_metric_type` | string | Type of evaluation metric. |
| `metadata` | json | Arbitrary key-value metadata associated with the metric. |
| `error.message` | string | Optional error message for the metric. |
| `error` | object | Error details: `message`, `stack`, `type`. |

**Response**

Expand Down
27 changes: 22 additions & 5 deletions content/en/llm_observability/instrumentation/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ If the request is successful, the API responds with a 202 network code and an em
| messages| [Message](#message) | List of messages. This should only be used for LLM spans. |
| documents| [Document](#document) | List of documents. This should only be used as the output for retrieval spans |
| prompt | [Prompt](#prompt) | Structured prompt metadata that includes the template and variables used for the LLM input. This should only be used for input IO on LLM spans. |
| embedding | []float | Embedding vector representation. **Only valid for embedding spans.** |
| parameters | object | Parameters used for the LLM request or response. **Only valid for LLM spans.** |


**Note**: When only `input.messages` is set for an LLM span, Datadog infers `input.value` from `input.messages` and uses the following inference logic:
Expand All @@ -166,6 +168,8 @@ If the request is successful, the API responds with a 202 network code and an em
|----------------------|--------|--------------------------|
| content [*required*] | string | The body of the message. |
| role | string | The role of the entity. |
| tool_calls | []object | List of tool calls made by the LLM. |
| tool_results | []object | List of tool results returned to the LLM. |

#### Document
| Field | Type | Description |
Expand All @@ -184,6 +188,7 @@ If the request is successful, the API responds with a 202 network code and an em
| Field | Type | Description |
|----------------------|--------|--------------------------|
| id | string | Logical identifier for this prompt template. Should be unique per `ml_app`. |
| name | string | Human-readable name for the prompt. |
| version | string | Version tag for the prompt (for example, "1.0.0"). If not provided, LLM Observability automatically generates a version by computing a hash of the template content. |
| template | string | Single string template form. Use placeholder syntax (like `{{variable_name}}`) to embed variables. This should not be set with `chat_template`. |
| chat_template | [[Message]](#message) | Multi-message template form. Use placeholder syntax (like `{{variable_name}}`) to embed variables in message content. This should not be set with `template`. |
Expand Down Expand Up @@ -219,10 +224,14 @@ If the request is successful, the API responds with a 202 network code and an em
| Field | Type | Description |
|-------------|-------------------|--------------|
| kind [*required*] | string | The [span kind][2]: `"agent"`, `"workflow"`, `"llm"`, `"tool"`, `"task"`, `"embedding"`, or `"retrieval"`. |
| model_name | string | The name of the model used. **Only valid for LLM spans.** |
| model_provider | string | The provider of the model. **Only valid for LLM spans.** |
| model_version | string | The version of the model. **Only valid for LLM spans.** |
| embedding_for_prompt_idx | integer | Index of the prompt for which the embedding is generated. **Only valid for embedding spans.** |
| error | [Error](#error) | Error information on the span. |
| input | [IO](#io) | The span's input information. |
| output | [IO](#io) | The span's output information. |
| metadata | Dict[key (string), value] where the value is a float, bool, or string | Data about the span that is not input or output related. Use the following metadata keys for LLM spans: `temperature`, `max_tokens`, `model_name`, and `model_provider`. |
| metadata | Dict[key (string), value] where the value is a float, bool, or string | Data about the span that is not input or output related. Use the following metadata keys for LLM spans: `temperature`, `max_tokens`. |

#### Metrics
| Field | Type | Description |
Expand Down Expand Up @@ -251,6 +260,9 @@ If the request is successful, the API responds with a 202 network code and an em
| duration [*required*] | float64 | The span's duration in nanoseconds. |
| meta [*required*] | [Meta](#meta) | The core content relative to the span. |
| status | string | Error status (`"ok"` or `"error"`). Defaults to `"ok"`. |
| service | string | The service name associated with the span. |
| ml_app | string | The ML application name. Overrides the top-level `ml_app` field. |
| ml_app_version | string | The ML application version. |
| apm_trace_id | string | The ID of the associated APM trace. Defaults to match the `trace_id` field. |
| metrics | [Metrics](#metrics) | Datadog metrics to collect. |
| session_id | string | The span's `session_id`. Overrides the top-level `session_id` field. |
Expand All @@ -266,6 +278,7 @@ If the request is successful, the API responds with a 202 network code and an em
| Field | Type | Description |
|----------|---------------------|--------------|
| ml_app [*required*] | string | The name of your LLM application. See [Application naming guidelines](#application-naming-guidelines). |
| ml_app_version | string | The version of your LLM application. |
| spans [*required*] | [[Span](#span)] | A list of spans. |
| tags | [[Tag](#tag)] | A list of top-level tags to apply to each span. |
| session_id | string | The session the list of spans belongs to. Can be overridden or set on individual spans as well. |
Expand Down Expand Up @@ -464,8 +477,11 @@ Evaluations must be joined to a unique span. You can identify the target span us
|--------------------------------------------------------------------|---------------------|--------------------------------------------------------------------------------------------------------|
| ID | string | Evaluation metric UUID (generated upon submission). |
| join_on [*required*] | [[JoinOn](#joinon)] | How the evaluation is joined to a span. |
| trace_id | string | The trace ID of the span associated with this evaluation. |
| span_id | string | The span ID of the span associated with this evaluation. |
| timestamp_ms [*required*] | int64 | A UTC UNIX timestamp in milliseconds representing the time the request was sent. |
| ml_app [*required*] | string | The name of your LLM application. See [Application naming guidelines](#application-naming-guidelines). |
| ml_app_version | string | The version of your LLM application. |
| metric_type [*required*] | string | The type of evaluation: `"categorical"`, `"score"`, or `"boolean"`. |
| label [*required*] | string | The unique name or label for the provided evaluation . |
| categorical_value [*required if the metric_type is "categorical"*] | string | A string representing the category that the evaluation belongs to. |
Expand All @@ -474,6 +490,7 @@ Evaluations must be joined to a unique span. You can identify the target span us
| assessment | string | An assessment of this evaluation. Accepted values are `pass` and `fail`. |
| reasoning | string | A text explanation of the evaluation result. |
| tags | [[Tag](#tag)] | A list of tags to apply to this particular evaluation metric. |
| metadata | object | Arbitrary key-value metadata associated with the evaluation. |

#### JoinOn

Expand All @@ -486,15 +503,15 @@ Evaluations must be joined to a unique span. You can identify the target span us

| Field | Type | Description |
|------------|-----------------|--------------|
| span_id | string | The span ID of the span that this evaluation is associated with. |
| trace_id | string | The trace ID of the span that this evaluation is associated with. |
| span_id [*required*] | string | The span ID of the span that this evaluation is associated with. |
| trace_id [*required*] | string | The trace ID of the span that this evaluation is associated with. |

#### TagContext

| Field | Type | Description |
|------------|-----------------|--------------|
| key | string | The tag key name. This must be the same key used when setting the tag on the span. |
| value | string | The tag value. This value must match exactly one span with the specified tag key/value pair. |
| key [*required*] | string | The tag key name. This must be the same key used when setting the tag on the span. |
| value [*required*] | string | The tag value. This value must match exactly one span with the specified tag key/value pair. |


#### EvalMetricsRequestData
Expand Down
Loading