Skip to content
Open
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
fa3bfce
create skeleton of OTel
janan07 Jan 20, 2026
8b871cb
add titles
janan07 Jan 20, 2026
11e7bbd
add initial draft of intro
janan07 Jan 20, 2026
57ed363
grammar fix
janan07 Jan 20, 2026
bd31f03
Add content for Resource Attributes and Enabling Observability
janan07 Jan 21, 2026
01ca53b
add content to enable observability
janan07 Jan 21, 2026
c3ac1fb
add considerations for config service attributes
janan07 Jan 22, 2026
6af333e
add req service attribute list
janan07 Jan 22, 2026
fea7e6e
add links to outline and reorder using section per Richard's feedback
janan07 Jan 22, 2026
3915103
add details
janan07 Jan 22, 2026
88763be
initial draft of process and properties for acquiring z/OS Attributes
janan07 Jan 22, 2026
7b62f21
remove unneeded file
janan07 Jan 22, 2026
f3536e2
cange overview title and restructure outline
janan07 Jan 22, 2026
2623606
reorder attribute categories
janan07 Jan 22, 2026
c4c3cd2
add details about signals
janan07 Jan 23, 2026
66aa80d
add deployment attributes info
janan07 Jan 23, 2026
bff2796
add comment
janan07 Jan 23, 2026
89c6c2d
Update enabling-observability-in-zowe.yaml.md
janan07 Jan 26, 2026
98fdae4
refactor intro and move examples to Using OTel
janan07 Jan 26, 2026
80f1629
add Otel install files to sidebars.js
janan07 Jan 26, 2026
76ef864
add using OTel metrics and sub-topics to sidebar
janan07 Jan 26, 2026
1cc90f5
fix syntax
janan07 Jan 26, 2026
2b4459c
fix syntax
janan07 Jan 26, 2026
f67b276
address comments
janan07 Jan 27, 2026
d71bd68
add inline note to dev
janan07 Jan 27, 2026
38d7ae4
refactor descriptions
janan07 Jan 27, 2026
e7cc615
add info note for explanation
janan07 Jan 27, 2026
770f30f
add OTel link
janan07 Jan 27, 2026
d90a030
improve note
janan07 Jan 27, 2026
1ffd218
fix syntax
janan07 Jan 27, 2026
4caee5c
add links to uis and fix formatting
janan07 Jan 27, 2026
daf14f1
fix sidebar in using
janan07 Jan 27, 2026
bfed466
improve service attribute topic
janan07 Jan 27, 2026
5f65bde
improve zos attribute content
janan07 Jan 27, 2026
21c6c52
updates to overview topic
janan07 Jan 27, 2026
3604057
remove config-apiml-observability-via-opentelemetry.md file
janan07 Jan 28, 2026
ab193e5
fix service.instance.id definition
janan07 Jan 28, 2026
a23961a
refactor from comments
janan07 Jan 28, 2026
4065e1d
remove note
janan07 Jan 28, 2026
ab2dc2d
remove otel overview from sidebar to fix build
janan07 Jan 28, 2026
cee746c
remove broken link to overview article no longer in this PR
janan07 Jan 28, 2026
ce7dc2d
improve how discovery service works description
janan07 Jan 29, 2026
98ccea6
address Richard's comments
janan07 Feb 2, 2026
6add9b6
add validation procedure
janan07 Feb 2, 2026
ce359dc
fix typo
janan07 Feb 2, 2026
d312a9b
fix formatting
janan07 Feb 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# API ML Provided Observability Signals and Attributes

**TODO: Dev to provide Actual Signals and Attributes**

<!-- This could be included in this topic. Please review -->

## Custom Telemetry Template
Use this template when requesting or defining new custom metrics for the API ML:

* **Signal Type**: (Metric / Trace / Log)
* **Name**: `zowe.apiml.[component].[functional_area]`
* **Description**: What does this signal represent?
* **Required Attributes**:
* `route.id`: Identifier of the routed service.
* `client.id`: (Optional) The ID of the consuming application.
* `zos.smf.id`: Automatically inherited from Resource.
Comment on lines +7 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell how relevant it is. Do we provide similar templates to request new feature also for other areas?

Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Configuring OpenTelemetry Deployment Attributes

To configure deployment-specific resource attributes for the Zowe API ML. These attributes allow you to categorize telemetry data based on the lifecycle stage of the application, such as distinguishing between production, staging, or development environments.

Unlike z/OS attributes which are often discovered automatically, deployment attributes are strictly informative and are typically defined manually. These attributes do not affect the unique identity of the service but are essential for filtering and grouping data within your observability backend. By explicitly labeling your environment, you ensure that performance anomalies in a test environment do not trigger false alerts in production monitoring views.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely explained. Is it right to compare them to the z/os attributes when APIML is not limited to running on z/os?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would be better. What do you think?

While platform-specific attributes (like those for z/OS) focus on the physical execution environment and are often discovered automatically, deployment attributes describe the logical purpose of the instance. These are defined manually and are universal across all platforms where API ML runs (z/OS, Linux, or Containers)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I would avoid the physical execution environment phrase as it can a virtual one too.


## Deployment Attribute Reference

The following attribute is used to describe the deployment of the single-service deployment of API ML:

* **deployment.environment.name**
Specifies the name of the deployment environment (Example: dev, test, staging, or production). Configuration Source: zowe.yaml

## Configuration Example in zowe.yaml

To set the deployment environment, add the `deployment.environment.name` key to the `resource.attributes` section of your zowe.yaml file.

```
zowe:
observability:
enabled: true
resource:
attributes:
# Deployment Attribute (Manual Entry)
deployment.environment.name: "production"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Configure OpenTelemetry Service Attributes

Services are identified via the service.name, service.namespace, and service.instance.id properties. Together, these attributes create a unique identity for API ML instances across your enterprise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service.name, service.namespace, and service.instance.id? Trying to be consistent...


In complex mainframe environments, you may have multiple API ML installations across different Sysplexes or data centers. To monitor these effectively, you must balance Logical Grouping (viewing all API ML traffic as one functional unit) with Instance Differentiation (identifying exactly which specific Address Space is experiencing an issue).

## The Hierarchy of Identification
OpenTelemetry uses a three-tier approach to define service identity:

* **service.name** (The Service)
Identifies the logical name of the service. This property value should be identical for all instances across your entire organization that perform the same function (e.g., zowe-apiml). Expected to be globally unique if `namespace` is not defined.

* **service.namespace** (The Environment/Site)
Groups services into logical sets. Use this property value to distinguish between different installations, such as sysplex-a vs. sysplex-b, or north-datacenter vs. south-datacenter. `service.name` is expected to be unique within the same `namespace`.

* **service.instance.id** (The Unique Instance)
Identifies a specific running process or Address Space. This value must be globally unique for every instance. As multiple z/OS systems can run identical Job Names, ensure that you combine the Job Name with a unique identifier (such as the LPAR name or a UUID) to ensure the instance can be isolated during troubleshooting.

<!-- Should we add service.version to this list of properties? -->

## Configuration Examples

**Example 1: Single API ML Installation (High Availability)**

In this scenario, both instances share the same namespace because they belong to the same logical cluster on the same Sysplex.

| Attribute | Instance 1 | Instance 2 |
| :--- | :--- | :--- |
| **service.name** | `zowe-apiml` | `zowe-apiml` |
| **service.namespace** | `production-plex` | `production-plex` |
| **service.instance.id** | `APIML01` | `APIML02` |

**Instance 1 configuration**
```
zowe:
components:
api-mediation-layer:
observability:
enabled: true
resource:
attributes:
service.name: "zowe-apiml"
service.namespace: "production-plex"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we put only examples of complete configuration instead of partial ones to specific attribute groups. For instance, here I would expect the 'production' to be set as deployment attribute

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting then that we remove all configuration examples in this article or replace them with the complete configuration? If the latter, can you provide me with the complete configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide only the partial samples, can we be sure that the users will be able to navigate the docs to follow the right full sample? The use-case from the sysprog's perspective is to enable observability, not configure just some subset of attributes. They need to do it either all or nothing.

We do not have the configuration yet ad the implementation is still in progress.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, we can remove all of the examples and have a sample when the implementation is completed.

service.instance.id: "APIML01"
```
**Instance 2 configuration**
```
zowe:
components:
api-mediation-layer:
observability:
enabled: true
resource:
attributes:
service.name: "zowe-apiml"
service.namespace: "production-plex"
service.instance.id: "APIML02"
```

## Example of Multi-Site Deployment

In this scenario, instances are separated by namespace to represent their physical data center locations.

| Attribute | Site 1 (Instance A) | Site 1 (Instance B) | Site 2 (Instance C) |
| :--- | :--- | :--- | :--- |
| **service.name** | `zowe-apiml` | `zowe-apiml` | `zowe-apiml` |
| **service.namespace** | `east-coast` | `east-coast` | `west-coast` |
| **service.instance.id** | `ZOWE-E1` | `ZOWE-E2` | `ZOWE-W1` |

**Site 1 (East Coast) Configuration:**

```
zowe:
components:
api-mediation-layer:
observability:
enabled: true
resource:
attributes:
service.name: "zowe-apiml"
service.namespace: "east-coast"
service.instance.id: "ZOWE-E1"
```
**Site 2 (West Coast) Configuration:**
```
zowe:
components:
api-mediation-layer:
observability:
enabled: true
resource:
attributes:
service.name: "zowe-apiml"
service.namespace: "west-coast"
service.instance.id: "ZOWE-W1"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Configure OpenTelemetry z/OS Attributes

<!-- VALIDATE THIS CONTENT AFTER SUPPORT IS IMPLEMENTED. -->

z/OS-specific resource attributes for API ML provide essential mainframe context to your telemetry data, allowing you to correlate metrics, traces, and logs with specific system identifiers such as SMF IDs, Sysplex names, and LPARs. By providing z/OS platform context, mainframe performance data can be integrated into distributed observability backends.

## How system discovery works

The z/OS attributes are primarily populated through an automated System Discovery process that occurs during the initialization of the API ML service. The integrated OpenTelemetry SDK executes platform-specific calls to query z/OS Control Blocks (such as the CVTSNAME or ECVT) and system variables.

## z/OS Attribute Reference

The following attributes are captured during system discovery to describe the mainframe environment:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things:

  1. Can we describe what 'system discovery' means here? What actually populates the attributes? Our junior sys prog won't understand it.
  2. Given Richard's statement on process.pid not being configured by system discovery, does this move to another section?


* **zos.smf.id**
The System Management Facility (SMF) Identifier that uniquely identifies a z/OS system within a SYSPLEX.
Configuration Source: System discovery

* **zos.sysplex.name**
The name of the SYSPLEX to which the z/OS system belongs.
Configuration Source: System discovery

* **mainframe.lpar.name**
Name of the LPAR that hosts the z/OS system.
Configuration Source: System discovery

* **os.type**
The operating system type, set to `zos`.
Configuration Source: Static

* **os.version**
The version string of the operating system (e.g., the release returned by `D IPLINFO`).
Configuration Source: System discovery

* **process.command**
The command or JOB name used to launch the Zowe process.
Configuration Source: System discovery

* **process.pid**
The Process Identifier, which on z/OS is set to the Address Space Identifier (ASID).
Configuration Source: System discovery

## Overriding Discovered Attributes in zowe.yaml

While the discovery process handles most identifiers automatically, you may occasionally need to provide a manual override (for example, in shared environments where you wish to report a custom logical LPAR name). This is performed in the `resource.attributes` section of your zowe.yaml:

```
zowe:
observability:
enabled: true
resource:
attributes:
# Overriding discovered z/OS attributes
zos.smf.id: "MVS1"
zos.sysplex.name: "LOCALPLX"
mainframe.lpar.name: "PRODLPAR"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Enabling API ML Observability in zowe.yaml

Review how to enable and configure the OpenTelemetry (OTel) integration within the Zowe API Mediation Layer (API ML) single-service deployment. Configure these parameters in `zowe.yaml` to enable API ML to export metrics, traces, and logs to an OpenTelemetry Collector.

## Configuration Overview

The observability configuration is located under the API Mediation Layer `component` section of the zowe.yaml, under which there are three observability properties:

* **enabled**
Activates the OTel SDK. Set to `true` to initialize the OpenTelemetry SDK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the customer actually wants to do? Enable observability, OpenTelemetry or OpenTelemetry SDK?
Is it clear that enabling OpenTelemetry SDK actually enables the Observability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


* **exporter**
Defines where the data is sent. Sub-properties of `exporter` include the following:

* **exporter.otlp.protocol**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Says protocol here, but URL in the description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the definition

The URL of your OTLP-compatible collector (e.g., z-Iris or Jaeger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Z-Iris and Jaeger really OTLP compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this mention


* **exporter.otlp.protocol**
The protocol is either `grpc` or `http/protobuf`.
**Default:** `grcp`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it. We do not have it configurable now. Can add it later if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To note here that if we do reinstate, we have a typo with grpc and grcp mentioned.


* **resource**
Defines the identity of the producer (Attributes).

* **resource.attributes**
A collection of key-value pairs used to identify the telemetry source. See the following sub-properties of `resource.attributes`:

* **service.name**
Copy link
Contributor

@richard-salac richard-salac Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we duplicate the explanation of service attributes that we have in a separate md file just for them. Similarly for the deployment. It will be difficult to keep them in sync over time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's sufficient, I can link to the specific attribute article for these three resources. It just seems that if we have an article for enablement, the user should have an easy reference to what parms are being configured...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but we do not have specific list now as some of the attributes are discovered automatically. The purpose of this PR is to create a skeleton to be updated once we get the issues implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This content is as a place-holder. I understand this could all change with the implementation.

Logical name of the service. Must be the same for all instances within the same HA deployment. Expected to be globally unique if `namespace` is not defined.

* **service.namespace**
The assigned value should help distinguish a group of services, such as the LPAR, or owner team. `service.name` is expected to be unique within the same `namespace`.

* **deployment.environment.name**
Specifies the name of the deployment environment (Example: dev, test, staging, or production). Configuration Source: zowe.yaml

To enable observability, configure the OpenTelemetry exporter and resource attributes within your `zowe.yaml` file with the following structure:

```
zowe:
observability:
enabled: true
exporter:
otlp:
endpoint: "http://otel-collector.your.domain:4317"
protocol: "grpc"
timeout: 10000
resource:
attributes:
service.name: "zowe-apiml"
service.namespace: "finance-production"
deployment.environment.name: "production"
```

## How the Export Works

When `enabled: true` is set, the API ML single-service starts a background telemetry engine. This engine gathers all signals and bundles these signals with all Resource Attributes. These bundles are then pushed by means of the OTLP Exporter to your specified endpoint.

Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Outline of API ML Observability Topics

The following files will be presented under Advanced server-side configuration under the **Install** tab:

* Configuring API ML Observability via OpenTelemetry
* [Configuring OpenTelemetry service attributes](configuring-otel-service-attributes.md)
* [Configuring OpenTelemetry deployment attributes](configuring-otel-deployment-attributes.md)
* [Configuring OpenTelemetry z/OS attributes](configuring-otel-zos-attributes.md)
* [Enabling Observability in zowe.yaml](enabling-observability-in-zowe.yaml.md)

The following files will be presented under Using Zowe API Mediation Layer under the **Use** tab:

* [Using your API ML OpenTelemetry metrics](using-your-otel-metrics.md)
* [API ML Provided Observability Signals and Attributes](apiml-provided-observability-signals-and-attributes.md)
* [Sample Output from API ML OpenTelemetry](sample-output-from-apiml-otel.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Sample Output from API ML OpenTelemetry

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Using Your API ML OpenTelemetry Metrics

## Examples of Useability of Telemetry data in API ML

How a system administrator interacts with this data depends on the visualization tool used (e.g., Grafana, Jaeger, or Broadcom WatchTower).

### Example 1: High-Level Health Monitoring (Metrics)
A system administrator views a Grafana dashboard. The administrator notices a spike in **`apiml.request.errors`**.
* **The View**: A red line graph shows a sudden jump from 0% to 15% error rate.
* **The Insight**: By filtering the dashboard using the attribute **`zos.smf.id`**, the admin realizes the errors are only occurring on **LPAR1**, while **LPAR2** remains healthy. This suggests a local configuration or connectivity issue on a specific system rather than a global software bug.


### Example 2: Latency Troubleshooting (Traces)
A user reports that a specific API is "timing out." The admin finds the relevant **`traceId`** in the logs and opens it in a trace viewer.
* **The View**: A "Gantt chart" style visualization of the request.
* **The Insight**:
* `apiml.gateway.total`: 2005ms
* `apiml.auth.check`: 5ms
* `apiml.backend.proxy`: 2000ms
* **The Action**: The admin sees that the Modulith itself only spent 5ms on logic, but waited 2 seconds for the backend mainframe service to respond. The admin can now confidently contact the specific backend service team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good examples, but none of them is scheduled for implementation with the current issues. Shouldn't we rather start with what we are going to have?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. Could you please provide me with actual examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only we have are in the ticket ready form implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I removed these examples. If necessary, I can restore them.



19 changes: 19 additions & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,16 @@ module.exports = {
"extend/extend-apiml/api-mediation-redis"
]
},
{
"type": "category",
"label": "Configuring API ML Observability via OpenTelemetry",
"items": [
"user-guide/api-mediation/observability/configuring-otel-service-attributes",
"user-guide/api-mediation/observability/configuring-otel-deployment-attributes",
"user-guide/api-mediation/observability/configuring-otel-zos-attributes",
"user-guide/api-mediation/observability/enabling-observability-in-zowe-yaml"
]
},
"user-guide/api-mediation/configuration-customizing-the-api-catalog-ui",
"user-guide/api-mediation/configuration-logging",
"user-guide/api-mediation/wto-message-on-startup",
Expand Down Expand Up @@ -527,6 +537,15 @@ module.exports = {
"user-guide/api-mediation-change-password-via-catalog",
],
},
{
type: "category",
label: "Using your API ML OpenTelemetry metrics",
link: { "type": "doc", "id": "user-guide/api-mediation/observability/using-your-otel-metrics" },
items: [
"user-guide/api-mediation/observability/apiml-provided-observability-signals-and-attributes",
"user-guide/api-mediation/observability/sample-output-from-apiml-otel"
],
},
"user-guide/api-mediation/api-mediation-update-password",
"user-guide/api-mediation/api-mediation-smf",
],
Expand Down