Skip to content

[Opentelemetry plugin] Specify api.ft.com metrics for underneath APIs #1191

@gyss

Description

@gyss

We'd like to have added a hook in the Opentelemetry plugin that adds specificity to the api.ft.com net_peer_name attribute. So we can select the metrics specific to an API that lives inside the API Gateway

What problem does this feature solve?

Due to the ongoing Graphite migration, we need to replace some of the system's Graphite based alerts with Opentelemetry.

Some of these alerts do need to trigger when a specific service is not working as it should. For example, let's see this alert in next-myft-api

	{
		name: 'Insights API is serving topic recommendations',
		id: 'snr-recs-availability',
		type: 'graphiteThreshold',
		metric: 'asPercent(sumSeries(next.heroku.myft-api.*.fetch.insights-api-topic-recommendations.response.status_{5xx,4xx}.count), sumSeries(next.heroku.myft-api.*.fetch.insights-api-topic-recommendations.response.status_200.count))',
		threshold: 10,
		severity: 2,
		businessImpact: 'Topic recommendations in myft pages or in some daily digest email might be missing.',
		technicalSummary: 'Calls to insight API to get topic recommendations are returning 4xx and 5xx results',
		panicGuide: 'Check if there is any issue with Insights API with Search and Recommendation team. You can get more details on Splunk with the query `index=heroku source="next-myft-api" sourcetype="heroku:app" level=error path="/v2/recommendation/user/*/concept"`. It is possible to modify the trafic to get all recommendations from neo4j: in Doppler, set up the env PERCENT_SNR_TOPIC_RECS to 0 and make sure FORCE_SNR_TOPIC_RECS is false. That will stop the calls to Insight API.',
	}

Graphite health check

The metrics inside next.heroku.myft-api.*.fetch.insights-api-topic-recommendations correspond to the requests done to the service in this URL https://api.ft.com/snr/v1/insights/topic-recommendations/, which lives inside of the API Gateway. But when creating the Opentelemetry Grafana panel for this alert, we find that net_peer_name doesn't allow us to select topic-recommendations, only api.ft.com because this label is domain based.

Screenshot 2024-09-06 at 09 58 26

Link to Grafana panel

This means that if next-myft-api uses many services that are inside API Gateway (which it does), all the metrics are bundled together and we can't tell where the 5xx or 4xx are coming from

Ideal solution

As suggested by Sam Parkinson here, we could add a hook in the Opentelemetry plugin to add the first part of the path to the net_peer_name attribute. In this case it would be api.ft.com/snr, which works for us because next-myft-api doesn't use any other API from snr. But maybe we could consider adding the last part of the path, eg : api.ft.com/topic-recommendations

Anyways, this is a problem shared in many repositories (and probably teams), and it'd be amazing to have this solved in the Opentelemetry plugin

Alternatives

An alternative is to code this OTel hook in every system that needs this granularity.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

🏗 In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions