Skip to content

Commit 8b31cff

Browse files
docs: improve trawler docs
fixes #49 Signed-off-by: Ricky Moorhouse <[email protected]>
1 parent b85cc5b commit 8b31cff

File tree

4 files changed

+146
-16
lines changed

4 files changed

+146
-16
lines changed

docs/config.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Configuring Trawler
2+
3+
Trawler gets its config from a mounted configmap containing config.yaml which looks like this:
4+
5+
```yaml
6+
trawler:
7+
frequency: 10
8+
use_kubeconfig: false
9+
prometheus:
10+
port: 63512
11+
enabled: true
12+
logging:
13+
level: debug
14+
filters: trawler:trace
15+
format: pretty
16+
nets:
17+
datapower:
18+
enabled: true
19+
timeout: 5
20+
username: trawler-monitor
21+
namespace: apic-gateway
22+
product:
23+
enabled: true
24+
username: trawler-monitor
25+
namespace: apic-management
26+
```
27+
28+
## General trawler settings:
29+
- frequency: number of seconds to wait between trawling for metrics
30+
- use_kubeconfig: use the current kubeconfig from the environment instead looking at _in cluster_ config
31+
32+
### Logging
33+
34+
Customise the level of detail logged by trawler. Trawler uses [alchemy logging](https://github.com/IBM/alchemy-logging) for logging and the parameters here are passed into alog on initialilsation.
35+
36+
- level: set the logging level (default is info)
37+
- filters: specify an individual log level for particular logging channels / trawler nets
38+
- format: (pretty or json) - typically json is used for parsing and pretty is used in development
39+
40+
### Prometheus settings:
41+
The port specified in the prometheus block needs to match the prometheus annotations on the deployed trawler pod for prometheus to discover the metrics exposed.
42+
43+
## Individual nets
44+
Each of the different areas of metrics is handled by a separate net, which can be enabled/disabled independently. The configuration for these is in most cases a pointer to the namespace the relevant subsystem is deployed into and credentials to use, however specific details are detailed below. Passwords are loaded separately from the following values in a kubernetes secret mounted at the default location of `/app/secrets` - which can be overridden using the SECRETS environment variable:
45+
46+
- datapower_password - password to use with the datapower net for accessing the [DataPower REST management](https://www.ibm.com/support/knowledgecenter/SS9H2Y_7.7.0/com.ibm.dp.doc/restmgtinterface.html) interface.
47+
- cloudmanager_password - password to use with the manager net to retreive API Connect usage metrics.
48+
49+
### DataPower net
50+
51+
Sample configuration:
52+
53+
datapower:
54+
enabled: true
55+
timeout: 5
56+
username: trawler-monitor
57+
namespace: apic-gateway
58+
api_tests:
59+
enabled: true
60+
apis:
61+
- name: echo
62+
path: /apic-sre/live/echo?text=trawler
63+
method: get
64+
headers: {}
65+
66+
- timeout: max seconds to wait for responses to DataPower REST calls
67+
- username: user to authenticate to datapower with - needs read privileges
68+
- namespace: (optional) namespace in which datapower is deployed - if not specified trawler will discover datapower pods across all namespaces it has permissions to.
69+
- api_tests: Enable a set of APIs to test invokes against directly on the datapower pods:
70+
- enabled: true / false (default false)
71+
- apis: list of APIs to test
72+
- name: used for the prometheus metric naming (datapower_invoke_api_{name}...)
73+
- path: full path for the API
74+
- method: HTTP Method to use
75+
- headers: map of key/value pairs for any headers required
76+
77+
78+
79+
### Management net
80+
81+
Sample config:
82+
83+
manager:
84+
enabled: true
85+
grant_type: client_credentials
86+
secret: trawler-creds
87+
secret_namespace: apic-monitoring
88+
max_frequency: 600
89+
process_org_metrics: false
90+
namespace: apic
91+
92+
- grant_type: Type of credentials to use for authentication to the platform API (currently supports password or client_credentials)
93+
- secret / secret_namespace: Name and namespace of secret containing the credentials
94+
- max_frequency: (default 600) number of seconds between queries to the manager. As the majority of these metrics change less frequently this lets you reduce the frequency of calls made to the platform APIs.
95+
- process_org_metrics: (default true) - query gateway processing event status for every provider org, in a large environment this will take a long time so you may want to disable it.
96+
- namespace: namespace the management subsystem is deployed in
97+
98+
### Analytics net
99+
100+
Sample config:
101+
102+
analytics:
103+
enabled: true
104+
namespace: apic
105+
106+
- namespace: namespace the analytics subsystem is deployed in

docs/index.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,8 @@ nets:
4545
- frequency: number of seconds to wait between trawling for metrics
4646
- use_kubeconfig: use the current kubeconfig from the environment instead looking at _in cluster_ config
4747
- logging: set the default logging level, output format and filters for specific components
48-
**Prometheus settings:**
49-
The port specified in the prometheus block needs to match the prometheus annotations on the deployed trawler pod for prometheus to discover the metrics exposed.
5048
51-
**Individual nets**
52-
Each of the different areas of metrics is handled by a separate net, which can be enabled/disabled independently. The configuration for these is currently a pointer to the namespace the relevant subsystem is deployed into and a username to use. Passwords are loaded separately from the following values in a kubernetes secret mounted at the default location of `/app/secrets` - which can be overridden using the SECRETS environment variable:
53-
54-
- datapower_password - password to use with the datapower net for accessing the [DataPower REST management](https://www.ibm.com/support/knowledgecenter/SS9H2Y_7.7.0/com.ibm.dp.doc/restmgtinterface.html) interface.
55-
- cloudmanager_password - password to use with the manager net to retreive API Connect usage metrics.
49+
[Detailed configuration options](docs/config.md)
5650
5751
## Issues, enhancements and pull requests
5852
@@ -62,6 +56,7 @@ Feature requests and issue reports are welcome as [github issues](https://github
6256
6357
- [Metrics gathered by trawler](docs/metrics.md)
6458
- [Install](docs/install.md)
59+
- [Configuring Trawler](docs/config.md)
6560
- [Frequently asked questions](docs/faq.md)
6661
6762

docs/install.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,12 @@ Alternatively you can adjust the deployment of the trawler pod to match the sear
6565
In this case prometheus-operator is configured to look for serviceMonitors set up with the release `prom-operator`.
6666

6767
For more details on the prometheus operator model see https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html
68+
69+
## Scraping Trawler metrics with Instana
70+
71+
If you are using Instana you can configure the Instana agent to scrape metrics from Trawler using the prometheus plugin options. An example agent config would look something like this:
72+
73+
com.instana.plugin.prometheus:
74+
customMetricSources:
75+
- url: '/' # metrics endpoint, the IP and port are auto-discovered
76+
metricNameIncludeRegex: '.*' # regular expression to filter metrics

docs/metrics.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,29 @@
33

44
The kind of metrics that trawler collects are currently as follows and are provided in the standard prometheus scrape format on the configured port:
55

6-
### Management subsystem
6+
### API Connect overview
77

88
| Description | metric name |
99
| ------------- |-------------|
1010
| API Connect version information | apiconnect_build_info|
11-
| Total users | apiconnect_users_total|
12-
| Number of provider_orgs | apiconnect_provider_orgs_total|
13-
| Number of consumer orgs | apiconnect_consumer_orgs_total|
14-
| Number of catalogs | apiconnect_catalogs_total|
15-
| Number of draft products / apis | apiconnect_draft_products_total / apiconnect_draft_apis_total|
16-
| Number of products / apis | apiconnect_products_total / apiconnect_apis_total|
17-
| Number of subscriptions | apiconnect_subscriptions_total|
11+
| Subsystem health (1 or 0)| apiconnect_health_status (labels for subsystems) |
12+
| Subsystem Resource status (states as labels) | apiconnect_analyticsclusters_status, apiconnect_gatewayclusters_status, apiconnect_managementclusters_status, apiconnect_portalclusters_status |
13+
14+
### Management subsystem
15+
16+
| Description | metric name |
17+
| ------------- |-------------|
18+
| Total users | manager_users_total|
19+
| Number of provider_orgs | manager_provider_orgs_total|
20+
| Number of catalogs | manager_catalogs_total|
21+
| Number of spaces | manager_spaces_total|
22+
| Number of draft products / apis | manager_draft_products_total / manager_draft_apis_total|
23+
| Number of products / apis | manager_products_total / manager_apis_total|
24+
| Number of consumer orgs | manager_consumer_orgs_total|
25+
| Number of consumer apps | manager_consumer_apps_total|
26+
| Number of subscriptions | manager_subscriptions_total|
27+
| Outstanding Gateway sent events | manager_gateway_processing_outstanding_sent_events |
28+
| Outstanding Gateway queued events | manager_gateway_processing_outstanding_queued_events |
1829

1930

2031
### DataPower subsystem
@@ -23,7 +34,15 @@ The kind of metrics that trawler collects are currently as follows and are provi
2334
| TCP connection stats | datapower_tcp_{state}|
2435
| Log target stats: events processed, dropped, pending | datapower_logtarget_{name}_{type}|
2536
| Object counts e.g. SSLClientProfile, APICollection, APIOperation etc. | datapower_{object}_total|
26-
| HTTP Stats | datapower_http_tenSeconds/oneMinute/tenMinutes/oneDay|
37+
| HTTP Stats | datapower_http_tenSeconds/oneMinute/tenMinutes/oneDay |
38+
| Gateway Peering Is primary? | datapower_gateway_peering_primary_info (peering_group=name) |
39+
| Gateway Peering Primary link ok? | datapower_gateway_peering_primary_link (peering_group=name) |
40+
| Gateway Peering Primary Offset | datapower_gateway_peering_primary_offset (peering_group=name) |
41+
| Invoke test API (defined in config) response time | datapower_invoke_api_{name}_time |
42+
| Invoke test API (defined in config) status | datapower_invoke_api_{name}_status_total (code=200, 500 etc. ) |
43+
| Invoke test API (defined in config) created | datapower_invoke_api_{name}_status_created |
44+
45+
2746

2847
### Analytics subsystem
2948
| Description | metric name |
@@ -32,3 +51,4 @@ The kind of metrics that trawler collects are currently as follows and are provi
3251
| Number of nodes in the cluster | analytics_data_nodes_total/analytics_nodes_total|
3352
| Number of shards in states - active, relocating, initialising, unassigned | analytics_{state}_shards_total|
3453
| Number of pending tasks | analytics_pending_tasks_total|
54+
| API Calls in last hour by status code | analytics_apicalls_lasthour_2xx, 4xx etc|

0 commit comments

Comments
 (0)