elastic · xrmx · Jan 14, 2025 · Jan 9, 2025 · Jan 9, 2025 · Jan 10, 2025
@@ -22,32 +22,31 @@ pip install elastic-opentelemetry-instrumentation-openai
 
 This instrumentation supports *zero-code* / *autoinstrumentation*:
 
-```
-opentelemetry-instrument python use_openai.py
+You can see telemetry from this package if you have an OpenTelemetry collector started, for example
+as documented in the root [examples](../../examples/) folder.
 
-# You can record more information about prompts as log events by enabling content capture.
-OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true opentelemetry-instrument python use_openai.py
+Set up a virtual environment with this package, the dependencies it requires
+and `dotenv` (a portable way to load environment variables).
+```
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r test-requirements.txt
+pip install python-dotenv[cli]
 ```
 
-Or manual instrumentation:
-
-```python
-import openai
-from opentelemetry.instrumentation.openai import OpenAIInstrumentor
-
-OpenAIInstrumentor().instrument()
-
-# assumes at least the OPENAI_API_KEY environment variable set
-client = openai.Client()
+Run the script with telemetry setup to use the instrumentation. [ollama.env](ollama.env)
+includes variables to point to Ollama instead of OpenAI, which allows you to
+run examples without a cloud account:
 
-messages = [
-    {
-        "role": "user",
-        "content": "Answer in up to 3 words: Which ocean contains the canarian islands?",
-    }
-]
+```
+dotenv -f ollama.env run -- \
+opentelemetry-instrument python examples/chat.py
+```
 
-chat_completion = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
+You can record more information about prompts as log events by enabling content capture.
+```
+OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true dotenv -f ollama.env run -- \
+opentelemetry-instrument python examples/chat.py
 ```
 
 ### Instrumentation specific environment variable configuration
@@ -110,20 +109,22 @@ response without querying the LLM.
 
 ### Azure OpenAI Environment Variables
 
-Azure is different from OpenAI primarily in that a URL has an implicit model. This means it ignores
-the model parameter set by the OpenAI SDK. The implication is that one endpoint cannot serve both
-chat and embeddings at the same time. Hence, we need separate environment variables for chat and
-embeddings. In either case, the `DEPLOYMENT_URL` is the "Endpoint Target URI" and the `API_KEY` is
-the `Endpoint Key` for a corresponding deployment in https://oai.azure.com/resource/deployments
-
-* `AZURE_CHAT_COMPLETIONS_DEPLOYMENT_URL`
-  * It should look like https://endpoint.com/openai/deployments/my-deployment/chat/completions?api-version=2023-05-15
-* `AZURE_CHAT_COMPLETIONS_API_KEY`
-  * It should be in hex like `abc01...` and possibly the same as `AZURE_EMBEDDINGS_API_KEY`
-* `AZURE_EMBEDDINGS_DEPLOYMENT_URL`
-  * It should look like https://endpoint.com/openai/deployments/my-deployment/embeddings?api-version=2023-05-15
-* `AZURE_EMBEDDINGS_API_KEY`
-  * It should be in hex like `abc01...` and possibly the same as `AZURE_CHAT_COMPLETIONS_API_KEY`
+The `AzureOpenAI` client extends `OpenAI` with parameters specific to the Azure OpenAI Service.
+
+* `AZURE_OPENAI_ENDPOINT` - "Azure OpenAI Endpoint" in https://oai.azure.com/resource/overview
+  * It should look like `https://<your-resource-name>.openai.azure.com/`
+* `AZURE_OPENAI_API_KEY` - "API key 1 (or 2)" in https://oai.azure.com/resource/overview
+  * It should look be a hex string like `abc01...`
+* `OPENAI_API_VERSION` = "Inference version" from https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation
+  * It should look like `2024-10-01-preview`
+* `TEST_CHAT_MODEL` = "Name" from https://oai.azure.com/resource/deployments that deployed a model
+  that supports tool calling, such as "gpt-4o-mini".
+* `TEST_EMBEDDINGS_MODEL` = "Name" from https://oai.azure.com/resource/deployments that deployed a
+  model that supports embeddings, such as "text-embedding-3-small".
+
+Note: The model parameter of a chat completion or embeddings request is substituted for an identical
+deployment name. As deployment names are arbitrary they may have no correlation with a real model
+like `gpt-4o`
 
 ## License
 

@@ -0,0 +1,23 @@
+import os
+
+import openai
+
+CHAT_MODEL = os.environ.get("TEST_CHAT_MODEL", "gpt-4o-mini")
+
+
+def main():
+    client = openai.Client()
+
+    messages = [
+        {
+            "role": "user",
+            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
+        }
+    ]
+
+    chat_completion = client.chat.completions.create(model=CHAT_MODEL, messages=messages)
+    print(chat_completion.choices[0].message.content)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,49 @@
+import os
+
+import numpy as np
+import openai
+
+EMBEDDINGS_MODEL = os.environ.get("TEST_EMBEDDINGS_MODEL", "text-embedding-3-small")
+
+
+def main():
+    client = openai.Client()
+
+    products = [
+        "Search: Ingest your data, and explore Elastic's machine learning and retrieval augmented generation (RAG) capabilities."
+        "Observability: Unify your logs, metrics, traces, and profiling at scale in a single platform.",
+        "Security: Protect, investigate, and respond to cyber threats with AI-driven security analytics."
+        "Elasticsearch: Distributed, RESTful search and analytics.",
+        "Kibana: Visualize your data. Navigate the Stack.",
+        "Beats: Collect, parse, and ship in a lightweight fashion.",
+        "Connectors: Connect popular databases, file systems, collaboration tools, and more.",
+        "Logstash: Ingest, transform, enrich, and output.",
+    ]
+
+    # Generate embeddings for each product. Keep them in an array instead of a vector DB.
+    product_embeddings = []
+    for product in products:
+        product_embeddings.append(create_embedding(client, product))
+
+    query_embedding = create_embedding(client, "What can help me connect to a database?")
+
+    # Calculate cosine similarity between the query and document embeddings
+    similarities = []
+    for product_embedding in product_embeddings:
+        similarity = np.dot(query_embedding, product_embedding) / (
+            np.linalg.norm(query_embedding) * np.linalg.norm(product_embedding)
+        )
+        similarities.append(similarity)
+
+    # Get the index of the most similar document
+    most_similar_index = np.argmax(similarities)
+
+    print(products[most_similar_index])
+
+
+def create_embedding(client, text):
+    return client.embeddings.create(input=[text], model=EMBEDDINGS_MODEL, encoding_format="float").data[0].embedding
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,10 @@
+# Env to run the integration tests against a local Ollama.
+OPENAI_BASE_URL=http://127.0.0.1:11434/v1
+OPENAI_API_KEY=notused
+
+# These models may be substituted in the future with inexpensive to run, newer
+# variants.
+TEST_CHAT_MODEL=qwen2.5:0.5b
+TEST_EMBEDDINGS_MODEL=all-minilm:33m
+
+OTEL_SERVICE_NAME=elastic-opentelemetry-instrumentation-openai
@@ -0,0 +1,191 @@
+import os
+from dataclasses import dataclass
+
+import openai
+from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+from opentelemetry.metrics import Histogram
+from vcr.unittest import VCRMixin
+
+# Use the same model for tools as for chat completion
+OPENAI_API_KEY = "test_openai_api_key"
+OPENAI_ORG_ID = "test_openai_org_key"
+OPENAI_PROJECT_ID = "test_openai_project_id"
+
+LOCAL_MODEL = "qwen2.5:0.5b"
+
+
+@dataclass
+class OpenAIEnvironment:
+    # TODO: add system
+    operation_name: str = "chat"
+    model: str = "gpt-4o-mini"
+    response_model: str = "gpt-4o-mini-2024-07-18"
+    server_address: str = "api.openai.com"
+    server_port: int = 443
+
+
+class OpenaiMixin(VCRMixin):
+    def _get_vcr_kwargs(self, **kwargs):
+        """
+        This scrubs sensitive data and gunzips bodies when in recording mode.
+
+        Without this, you would leak cookies and auth tokens in the cassettes.
+        Also, depending on the request, some responses would be binary encoded
+        while others plain json. This ensures all bodies are human-readable.
+        """
+        return {
+            "decode_compressed_response": True,
+            "filter_headers": [
+                ("authorization", "Bearer " + OPENAI_API_KEY),
+                ("openai-organization", OPENAI_ORG_ID),
+                ("openai-project", OPENAI_PROJECT_ID),
+                ("cookie", None),
+            ],
+            "before_record_response": self.scrub_response_headers,
+        }
+
+    @staticmethod
+    def scrub_response_headers(response):
+        """
+        This scrubs sensitive response headers. Note they are case-sensitive!
+        """
+        response["headers"]["openai-organization"] = OPENAI_ORG_ID
+        response["headers"]["Set-Cookie"] = "test_set_cookie"
+        return response
+
+    @classmethod
+    def setup_client(cls):
+        # Control the arguments
+        return openai.Client(
+            api_key=os.getenv("OPENAI_API_KEY", OPENAI_API_KEY),
+            organization=os.getenv("OPENAI_ORG_ID", OPENAI_ORG_ID),
+            project=os.getenv("OPENAI_PROJECT_ID", OPENAI_PROJECT_ID),
+            max_retries=1,
+        )
+
+    @classmethod
+    def setup_environment(cls):
+        return OpenAIEnvironment()
+
+    @classmethod
+    def setUpClass(cls):
+        cls.client = cls.setup_client()
+        cls.openai_env = cls.setup_environment()
+
+    def setUp(self):
+        super().setUp()
+        OpenAIInstrumentor().instrument()
+
+    def tearDown(self):
+        super().tearDown()
+        OpenAIInstrumentor().uninstrument()
+
+    def assertOperationDurationMetric(self, metric: Histogram):
+        self.assertEqual(metric.name, "gen_ai.client.operation.duration")
+        self.assert_metric_expected(
+            metric,
+            [
+                self.create_histogram_data_point(
+                    count=1,
+                    sum_data_point=0.006543334107846022,
+                    max_data_point=0.006543334107846022,
+                    min_data_point=0.006543334107846022,
+                    attributes={
+                        "gen_ai.operation.name": self.openai_env.operation_name,
+                        "gen_ai.request.model": self.openai_env.model,
+                        "gen_ai.response.model": self.openai_env.response_model,
+                        "gen_ai.system": "openai",
+                        "server.address": self.openai_env.server_address,
+                        "server.port": self.openai_env.server_port,
+                    },
+                ),
+            ],
+            est_value_delta=0.2,
+        )
+
+    def assertErrorOperationDurationMetric(self, metric: Histogram, attributes: dict, data_point: float = None):
+        self.assertEqual(metric.name, "gen_ai.client.operation.duration")
+        default_attributes = {
+            "gen_ai.operation.name": self.openai_env.operation_name,
+            "gen_ai.request.model": self.openai_env.model,
+            "gen_ai.system": "openai",
+            "error.type": "APIConnectionError",
+            "server.address": "localhost",
+            "server.port": 9999,
+        }
+        if data_point is None:
+            data_point = 0.8643839359283447
+        self.assert_metric_expected(
+            metric,
+            [
+                self.create_histogram_data_point(
+                    count=1,
+                    sum_data_point=data_point,
+                    max_data_point=data_point,
+                    min_data_point=data_point,
+                    attributes={**default_attributes, **attributes},
+                ),
+            ],
+            est_value_delta=0.5,
+        )
+
+    def assertTokenUsageInputMetric(self, metric: Histogram, input_data_point=4):
+        self.assertEqual(metric.name, "gen_ai.client.token.usage")
+        self.assert_metric_expected(
+            metric,
+            [
+                self.create_histogram_data_point(
+                    count=1,
+                    sum_data_point=input_data_point,
+                    max_data_point=input_data_point,
+                    min_data_point=input_data_point,
+                    attributes={
+                        "gen_ai.operation.name": self.openai_env.operation_name,
+                        "gen_ai.request.model": self.openai_env.model,
+                        "gen_ai.response.model": self.openai_env.response_model,
+                        "gen_ai.system": "openai",
+                        "server.address": self.openai_env.server_address,
+                        "server.port": self.openai_env.server_port,
+                        "gen_ai.token.type": "input",
+                    },
+                ),
+            ],
+        )
+
+    def assertTokenUsageMetric(self, metric: Histogram, input_data_point=24, output_data_point=4):
+        self.assertEqual(metric.name, "gen_ai.client.token.usage")
+        self.assert_metric_expected(
+            metric,
+            [
+                self.create_histogram_data_point(
+                    count=1,
+                    sum_data_point=input_data_point,
+                    max_data_point=input_data_point,
+                    min_data_point=input_data_point,
+                    attributes={
+                        "gen_ai.operation.name": self.openai_env.operation_name,
+                        "gen_ai.request.model": self.openai_env.model,
+                        "gen_ai.response.model": self.openai_env.response_model,
+                        "gen_ai.system": "openai",
+                        "server.address": self.openai_env.server_address,
+                        "server.port": self.openai_env.server_port,
+                        "gen_ai.token.type": "input",
+                    },
+                ),
+                self.create_histogram_data_point(
+                    count=1,
+                    sum_data_point=output_data_point,
+                    max_data_point=output_data_point,
+                    min_data_point=output_data_point,
+                    attributes={
+                        "gen_ai.operation.name": self.openai_env.operation_name,
+                        "gen_ai.request.model": self.openai_env.model,
+                        "gen_ai.response.model": self.openai_env.response_model,
+                        "gen_ai.system": "openai",
+                        "server.address": self.openai_env.server_address,
+                        "server.port": self.openai_env.server_port,
+                        "gen_ai.token.type": "output",
+                    },
+                ),
+            ],
+        )