Skip to content

Commit bbc7ff9

Browse files
authored
Merge pull request #568 from Beckn-One/feat/observability
Feat/observability
2 parents 93cb164 + 176b8f3 commit bbc7ff9

31 files changed

+2033
-45
lines changed

CONFIG.md

Lines changed: 144 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,14 @@
66
3. [Top-Level Configuration](#top-level-configuration)
77
4. [HTTP Configuration](#http-configuration)
88
5. [Logging Configuration](#logging-configuration)
9-
6. [Plugin Manager Configuration](#plugin-manager-configuration)
10-
7. [Module Configuration](#module-configuration)
11-
8. [Handler Configuration](#handler-configuration)
12-
9. [Plugin Configuration](#plugin-configuration)
13-
10. [Routing Configuration](#routing-configuration)
14-
11. [Deployment Scenarios](#deployment-scenarios)
15-
12. [Configuration Examples](#configuration-examples)
9+
6. [Metrics Configuration](#metrics-configuration)
10+
7. [Plugin Manager Configuration](#plugin-manager-configuration)
11+
8. [Module Configuration](#module-configuration)
12+
9. [Handler Configuration](#handler-configuration)
13+
10. [Plugin Configuration](#plugin-configuration)
14+
11. [Routing Configuration](#routing-configuration)
15+
12. [Deployment Scenarios](#deployment-scenarios)
16+
13. [Configuration Examples](#configuration-examples)
1617

1718
---
1819

@@ -70,6 +71,7 @@ The main configuration file follows this structure:
7071
```yaml
7172
appName: "onix-local"
7273
log: {...}
74+
metrics: {...}
7375
http: {...}
7476
pluginManager: {...}
7577
modules: [...]
@@ -187,6 +189,120 @@ log:
187189

188190
---
189191

192+
## Application-Level Plugins Configuration
193+
194+
### `plugins`
195+
**Type**: `object`
196+
**Required**: No
197+
**Description**: Application-level plugin configurations. These plugins apply to the entire application and are shared across all modules.
198+
199+
#### `plugins.otelsetup`
200+
**Type**: `object`
201+
**Required**: No
202+
**Description**: OpenTelemetry configuration controlling whether the Prometheus exporter is enabled.
203+
204+
**Important**: This block is optional—omit it to run without telemetry. When present, the `/metrics` endpoint is exposed on a separate port (configurable via `metricsPort`) only if `enableMetrics: true`.
205+
206+
##### Parameters:
207+
208+
###### `id`
209+
**Type**: `string`
210+
**Required**: Yes
211+
**Description**: Plugin identifier. Must be `"otelsetup"`.
212+
213+
###### `config`
214+
**Type**: `object`
215+
**Required**: Yes
216+
**Description**: Plugin configuration parameters.
217+
218+
###### `config.enableMetrics`
219+
**Type**: `string` (boolean)
220+
**Required**: No
221+
**Default**: `"true"`
222+
**Description**: Enables metrics collection and the `/metrics` endpoint. Must be `"true"` or `"false"` as a string.
223+
224+
###### `config.serviceName`
225+
**Type**: `string`
226+
**Required**: No
227+
**Default**: `"beckn-onix"`
228+
**Description**: Sets the `service.name` resource attribute.
229+
230+
###### `config.serviceVersion`
231+
**Type**: `string`
232+
**Required**: No
233+
**Description**: Sets the `service.version` resource attribute.
234+
235+
###### `config.environment`
236+
**Type**: `string`
237+
**Required**: No
238+
**Default**: `"development"`
239+
**Description**: Sets the `deployment.environment` attribute (e.g., `development`, `staging`, `production`).
240+
241+
###### `config.metricsPort`
242+
**Type**: `string`
243+
**Required**: No
244+
**Default**: `"9090"`
245+
**Description**: Port on which the metrics HTTP server will listen. The metrics endpoint is hosted on a separate server from the main application.
246+
247+
**Example - Enable Metrics** (matches `config/local-simple.yaml`):
248+
```yaml
249+
plugins:
250+
otelsetup:
251+
id: otelsetup
252+
config:
253+
serviceName: "beckn-onix"
254+
serviceVersion: "1.0.0"
255+
enableMetrics: "true"
256+
environment: "development"
257+
metricsPort: "9090"
258+
```
259+
260+
### Accessing Metrics
261+
262+
When `plugins.otelsetup.config.enableMetrics: "true"`, the metrics endpoint is hosted on a separate HTTP server. Scrape metrics at:
263+
264+
```
265+
http://your-server:9090/metrics
266+
```
267+
268+
**Note**: The metrics server runs on the port specified by `config.metricsPort` (default: `9090`), which is separate from the main application port configured in `http.port`.
269+
270+
### Metrics Collected
271+
272+
Metrics are organized by module for better maintainability and encapsulation:
273+
274+
#### OTel Setup (from `otelsetup` plugin)
275+
- Prometheus exporter & `/metrics` endpoint on separate HTTP server
276+
- Go runtime instrumentation (`go_*`), resource attributes, and meter provider wiring
277+
278+
#### Step Execution Metrics (from `telemetry` package)
279+
- `onix_step_executions_total`, `onix_step_execution_duration_seconds`, `onix_step_errors_total`
280+
281+
#### Handler Metrics (from `handler` module)
282+
- `beckn_signature_validations_total` - Signature validation attempts
283+
- `beckn_schema_validations_total` - Schema validation attempts
284+
- `onix_routing_decisions_total` - Routing decisions taken by handler
285+
286+
#### Cache Metrics (from `cache` plugin)
287+
- `onix_cache_operations_total`, `onix_cache_hits_total`, `onix_cache_misses_total`
288+
289+
#### Plugin Metrics (from `telemetry` package)
290+
- `onix_plugin_execution_duration_seconds`, `onix_plugin_errors_total`
291+
292+
#### Runtime Metrics
293+
- Go runtime metrics (`go_*`) and Redis instrumentation via `redisotel`
294+
295+
Each metric includes consistent labels such as `module`, `role`, `action`, `status`, `step`, `plugin_id`, and `schema_version` to enable low-cardinality dashboards.
296+
297+
**Note**: Metric definitions are now located in their respective modules:
298+
- OTel setup: `pkg/plugin/implementation/otelsetup`
299+
- Step metrics: `core/module/handler/step_metrics.go`
300+
- Handler metrics: `core/module/handler/handlerMetrics.go`
301+
- Cache metrics: `pkg/plugin/implementation/cache/cache_metrics.go`
302+
- Plugin metrics: `pkg/telemetry/pluginMetrics.go`
303+
304+
---
305+
190306
## Plugin Manager Configuration
191307

192308
### `pluginManager`
@@ -1045,13 +1161,18 @@ routingRules:
10451161
- Embedded Ed25519 keys
10461162
- Local Redis
10471163
- Simplified routing
1164+
- Optional metrics collection (available on separate port when enabled)
10481165

10491166
**Use Case**: Quick local development and testing
10501167

10511168
```yaml
10521169
appName: "onix-local"
10531170
log:
10541171
level: debug
1172+
metrics:
1173+
enabled: true
1174+
exporterType: prometheus
1175+
serviceName: onix-local
10551176
http:
10561177
port: 8081
10571178
modules:
@@ -1063,6 +1184,8 @@ modules:
10631184
config: {}
10641185
```
10651186

1187+
**Metrics Access**: When enabled, access metrics at `http://localhost:9090/metrics` (default metrics port, configurable via `plugins.otelsetup.config.metricsPort`)
1188+
10661189
### 2. Local Development (Vault Mode)
10671190

10681191
**File**: `config/local-dev.yaml`
@@ -1096,10 +1219,21 @@ modules:
10961219
- Production Redis
10971220
- Remote plugin loading
10981221
- Pub/Sub integration
1222+
- OpenTelemetry metrics enabled (available on separate port, default: 9090)
10991223

11001224
**Use Case**: Single deployment serving both roles
11011225

11021226
```yaml
1227+
appName: "onix-production"
1228+
log:
1229+
level: info
1230+
destinations:
1231+
- type: stdout
1232+
metrics:
1233+
enabled: true
1234+
exporterType: prometheus
1235+
serviceName: beckn-onix
1236+
serviceVersion: "1.0.0"
11031237
pluginManager:
11041238
root: /app/plugins
11051239
remoteRoot: /mnt/gcs/plugins/plugins_bundle.zip
@@ -1122,6 +1256,9 @@ modules:
11221256
topic: bapNetworkReciever
11231257
```
11241258

1259+
**Metrics Access**:
1260+
- Prometheus scraping: `http://your-server:9090/metrics` (default metrics port, configurable via `plugins.otelsetup.config.metricsPort`)
1261+
11251262
### 4. Production BAP-Only Mode
11261263

11271264
**File**: `config/onix-bap/adapter.yaml`

README.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,45 @@ The **Beckn Protocol** is an open protocol that enables location-aware, local co
6464
### 📊 **Observability**
6565
- **Structured Logging**: JSON-formatted logs with contextual information
6666
- **Transaction Tracking**: End-to-end request tracing with unique IDs
67-
- **Metrics Support**: Performance and business metrics collection
67+
- **OpenTelemetry Metrics**: Pull-based metrics exposed via `/metrics`
68+
- RED metrics for every module and action (rate, errors, duration)
69+
- Per-step histograms with error attribution
70+
- Cache, routing, plugin, and business KPIs (signature/schema validations, Beckn messages)
71+
- Native Prometheus exporter with Grafana dashboards & alert rules (`monitoring/`)
72+
- Opt-in: add a `plugins.otelsetup` block in your config to wire the `otelsetup` plugin; omit it to run without metrics. Example:
73+
74+
```yaml
75+
plugins:
76+
otelsetup:
77+
id: otelsetup
78+
config:
79+
serviceName: "beckn-onix"
80+
serviceVersion: "1.0.0"
81+
enableMetrics: "true"
82+
environment: "development"
83+
```
84+
- **Modular Metrics Architecture**: Metrics are organized by module for better maintainability:
85+
- OTel SDK wiring via `otelsetup` plugin
86+
- Step execution metrics in `telemetry` package
87+
- Handler metrics (signature, schema, routing) in `handler` module
88+
- Cache metrics in `cache` plugin
89+
- **Runtime Instrumentation**: Go runtime + Redis client metrics baked in
6890
- **Health Checks**: Liveness and readiness probes for Kubernetes
6991

92+
#### Monitoring Quick Start
93+
```bash
94+
./install/build-plugins.sh
95+
go build -o beckn-adapter ./cmd/adapter
96+
./beckn-adapter --config=config/local-simple.yaml
97+
cd monitoring && docker-compose -f docker-compose-monitoring.yml up -d
98+
open http://localhost:3000 # Grafana (admin/admin)
99+
```
100+
Resources:
101+
- `monitoring/prometheus.yml` – scrape config
102+
- `monitoring/prometheus-alerts.yml` – alert rules (RED, cache, step, plugin)
103+
- `monitoring/grafana/dashboards/beckn-onix-overview.json` – curated dashboard
104+
- `docs/METRICS_RUNBOOK.md` – runbook with PromQL recipes & troubleshooting
105+
70106
### 🌐 **Multi-Domain Support**
71107
- **Retail & E-commerce**: Product search, order management, fulfillment tracking
72108
- **Mobility Services**: Ride-hailing, public transport, vehicle rentals
@@ -356,6 +392,15 @@ modules:
356392
| POST | `/bpp/receiver/*` | Receives all BAP requests |
357393
| POST | `/bpp/caller/on_*` | Sends responses back to BAP |
358394

395+
### Observability Endpoints
396+
397+
| Method | Endpoint | Description |
398+
|--------|----------|-------------|
399+
| GET | `/health` | Health check endpoint |
400+
| GET | `/metrics` | Prometheus metrics endpoint (when telemetry is enabled) |
401+
402+
**Note**: The `/metrics` endpoint is available when `telemetry.enableMetrics: true` in the configuration file. It returns metrics in Prometheus format.
403+
359404
## Documentation
360405

361406
- **[Setup Guide](SETUP.md)**: Complete installation, configuration, and deployment instructions

cmd/adapter/main.go

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,19 @@ import (
1717
"github.com/beckn-one/beckn-onix/core/module/handler"
1818
"github.com/beckn-one/beckn-onix/pkg/log"
1919
"github.com/beckn-one/beckn-onix/pkg/plugin"
20+
"github.com/beckn-one/beckn-onix/pkg/telemetry"
2021
)
2122

23+
// ApplicationPlugins holds application-level plugin configurations.
24+
type ApplicationPlugins struct {
25+
OtelSetup *plugin.Config `yaml:"otelsetup,omitempty"`
26+
}
27+
2228
// Config struct holds all configurations.
2329
type Config struct {
2430
AppName string `yaml:"appName"`
2531
Log log.Config `yaml:"log"`
32+
Plugins ApplicationPlugins `yaml:"plugins,omitempty"`
2633
PluginManager *plugin.ManagerConfig `yaml:"pluginManager"`
2734
Modules []module.Config `yaml:"modules"`
2835
HTTP httpConfig `yaml:"http"`
@@ -91,11 +98,39 @@ func validateConfig(cfg *Config) error {
9198
return nil
9299
}
93100

101+
// loadAppPlugin is a generic function to load and validate application-level plugins.
102+
func loadAppPlugin[T any](ctx context.Context, name string, cfg *plugin.Config, mgrFunc func(context.Context, *plugin.Config) (T, error)) error {
103+
if cfg == nil {
104+
log.Debugf(ctx, "Skipping %s plugin: not configured", name)
105+
return nil
106+
}
107+
108+
_, err := mgrFunc(ctx, cfg)
109+
if err != nil {
110+
return fmt.Errorf("failed to load %s plugin (%s): %w", name, cfg.ID, err)
111+
}
112+
113+
log.Debugf(ctx, "Loaded %s plugin: %s", name, cfg.ID)
114+
return nil
115+
}
116+
117+
// initAppPlugins initializes application-level plugins including telemetry.
118+
// This function is designed to be extensible for future plugin types.
119+
func initAppPlugins(ctx context.Context, mgr *plugin.Manager, cfg ApplicationPlugins) error {
120+
if err := loadAppPlugin(ctx, "OtelSetup", cfg.OtelSetup, func(ctx context.Context, cfg *plugin.Config) (*telemetry.Provider, error) {
121+
return mgr.OtelSetup(ctx, cfg)
122+
}); err != nil {
123+
return fmt.Errorf("failed to initialize application plugins: %w", err)
124+
}
125+
126+
return nil
127+
}
128+
94129
// newServer creates and initializes the HTTP server.
95130
func newServer(ctx context.Context, mgr handler.PluginManager, cfg *Config) (http.Handler, error) {
96131
mux := http.NewServeMux()
97-
err := module.Register(ctx, cfg.Modules, mux, mgr)
98-
if err != nil {
132+
133+
if err := module.Register(ctx, cfg.Modules, mux, mgr); err != nil {
99134
return nil, fmt.Errorf("failed to register modules: %w", err)
100135
}
101136
return mux, nil
@@ -126,6 +161,11 @@ func run(ctx context.Context, configPath string) error {
126161
closers = append(closers, closer)
127162
log.Debug(ctx, "Plugin manager loaded.")
128163

164+
// Initialize plugins including telemetry.
165+
if err := initAppPlugins(ctx, mgr, cfg.Plugins); err != nil {
166+
return fmt.Errorf("failed to initialize plugins: %w", err)
167+
}
168+
129169
// Initialize HTTP server.
130170
log.Infof(ctx, "Initializing HTTP server")
131171
srv, err := newServerFunc(ctx, mgr, cfg)

cmd/adapter/main_test.go

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,11 @@ func (m *MockPluginManager) KeyManager(ctx context.Context, cache definition.Cac
7373
return nil, nil
7474
}
7575

76+
// TransportWrapper returns a mock implementation of the TransportWrapper interface.
77+
func (m *MockPluginManager) TransportWrapper(ctx context.Context, cfg *plugin.Config) (definition.TransportWrapper, error) {
78+
return nil, nil
79+
}
80+
7681
// SchemaValidator returns a mock implementation of the SchemaValidator interface.
7782
func (m *MockPluginManager) SchemaValidator(ctx context.Context, cfg *plugin.Config) (definition.SchemaValidator, error) {
7883
return nil, nil
@@ -170,14 +175,19 @@ func TestRunFailure(t *testing.T) {
170175

171176
// Mock dependencies
172177
originalNewManager := newManagerFunc
173-
// newManagerFunc = func(ctx context.Context, cfg *plugin.ManagerConfig) (*plugin.Manager, func(), error) {
174-
// return tt.mockMgr()
175-
// }
176-
newManagerFunc = nil
178+
// Ensure newManagerFunc is never nil to avoid panic if invoked.
179+
newManagerFunc = func(ctx context.Context, cfg *plugin.ManagerConfig) (*plugin.Manager, func(), error) {
180+
_, closer, err := tt.mockMgr()
181+
if err != nil {
182+
return nil, closer, err
183+
}
184+
// Return a deterministic error so the code path exits cleanly if reached.
185+
return nil, closer, errors.New("mock manager error")
186+
}
177187
defer func() { newManagerFunc = originalNewManager }()
178188

179-
originalNewServer := newServerFunc
180-
newServerFunc = func(ctx context.Context, mgr handler.PluginManager, cfg *Config) (http.Handler, error) {
189+
originalNewServer := newServerFunc
190+
newServerFunc = func(ctx context.Context, mgr handler.PluginManager, cfg *Config) (http.Handler, error) {
181191
return tt.mockServer(ctx, mgr, cfg)
182192
}
183193
defer func() { newServerFunc = originalNewServer }()

0 commit comments

Comments
 (0)