Skip to content

Conversation

nina9753
Copy link

@nina9753 nina9753 commented Aug 5, 2025

What does this PR do?

Extends the official PubSub plugin capabilities and optimizes Datadog tracing instrumentation for Google Cloud Pub/Sub push messages and Cloud Events across all Node.js web frameworks.

  • Automatic Detection: No configuration needed
  • Large Payloads: Supports up to 10MB efficiently
  • All Frameworks: Works with Express, Fastify, Hapi, Koa, etc.
  • Distributed Tracing: Maintains trace context across services
  • Production Ready: Comprehensive error handling

GCP sends push → Express server receives POST request
server.js detects → "This is a Pub/Sub push!" (via headers)
server.js creates → PubSub span with distributed tracing
express.js creates → Express span as child of PubSub span
main.js processes → Business logic with automatic tracing

This makes the spans have this hierarchy :
pubsub.receive (created by gcp-pubsub-push.js)
└── http.request (manually created as child)
└── express.request (inherits via scope activation)
└── express.middleware (inherits via scope activation)
└── your-business-logic (main.js)

Follow-up PR will be opened to create the synthetic span in the flame graph

Motivation

An inferred span for the HTTP push post to the Cloud Run service from a pub/sub topic
Example full Push Direct Span of a cloud run service triggering another service using a direct push subscription
Example full Eventarc pubsub trigger span of a cloud run service triggering another service using an Eventarc cloud event trigger

Plugin Checklist

Additional Notes

The changes had to be made in server.js because in Push Subscription, the Cloud Run Service receives HTTP Requests.

  • Google Cloud Pub/Sub acts as the HTTP client
  • The cloud run service acts as the HTTP server
  • GCP topic pushes TO the cloud run service via HTTP POST requests
  • Then the cloud run service receives the pushed messages

Additional information can be found in this doc

image

 Please enter the commit message for your changes. Lines starting
@nina9753 nina9753 requested review from a team as code owners August 5, 2025 23:04
Copy link

github-actions bot commented Aug 5, 2025

Overall package size

Self size: 11.38 MB
Deduped: 111.03 MB
No deduping: 111.42 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.1.0 | 20.37 MB | 20.37 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.14 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 14.28571% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.80%. Comparing base (72172af) to head (b056b6a).
⚠️ Report is 22 commits behind head on master.

Files with missing lines Patch % Lines
...es/datadog-instrumentations/src/gcp-pubsub-push.js 0.79% 125 Missing ⚠️
...ckages/datadog-instrumentations/src/body-parser.js 14.28% 12 Missing ⚠️
...ckages/datadog-instrumentations/src/http/server.js 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6233      +/-   ##
==========================================
- Coverage   83.28%   82.80%   -0.49%     
==========================================
  Files         478      479       +1     
  Lines       19790    19971     +181     
==========================================
+ Hits        16482    16536      +54     
- Misses       3308     3435     +127     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Aug 6, 2025

Benchmarks

Benchmark execution time: 2025-08-11 15:38:03

Comparing candidate commit b056b6a in PR branch nina.rei/SVLS-7168/add-gcp-push-pubsub with baseline commit 72172af in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1271 metrics, 52 unstable metrics.

@nina9753 nina9753 changed the title [SVLS-7168] Create Synthetic Span for GCP Push Subscriptions (including Cloud Events) [SVLS-7168] Create inferred Span for GCP Push Subscriptions (including Cloud Events) Aug 6, 2025
const spanTags = {
component: 'google-cloud-pubsub',
'span.kind': 'consumer',
'gcloud.project_id': projectId || 'unknown',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my eduction: is this || unknown a standard pattern for span tags? how do we decide between setting the tag to "unknown" vs not setting the tag at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually add a fallback when a value is not available. Leaving it undefined will simply not add the tag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed unknown from the tags

Comment on lines 22 to 27
function parseCloudEventMessage (json, req) {
// Eventarc only uses Binary Content Mode with ce-specversion header
// Payload structure: {"message": {...}, "subscription": "..."}
const message = json.message || json
const attrs = { ...message?.attributes }
const subscription = json.subscription || req.headers['ce-subscription'] || 'cloud-event-subscription'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This server.js file is a generic HTTP server instrumentation, so we shouldn't be putting Pub/Sub or Eventarc-specific logic directly in this file. Can you move that logic into a separate file/module?

Maybe you can do something like this (this code was suggested by ChatGPT):

addHook({ name: 'http' }, http => {
  shimmer.wrap(http.Server.prototype, 'emit', wrapEmit)

  if (isGCPEnvironment()) {
    require('./pubsub').patch(http)
  }

  return http
})

Copy link
Author

@nina9753 nina9753 Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this approach because then both files are hooking the same module:

http/server.js: addHook({ name: httpNames }, http => { shimmer.wrap(http.Server.prototype, 'emit', wrapEmit) })
gcp-pubsub-push.js: addHook({ name: ['http', 'node:http'] }, (http) => { shimmer.wrap(http.Server.prototype, 'emit', wrapEmitForGcp) })

Both hooks would execute and try to wrap emit on the same prototype. The second wrap overwrites the first and replaces the method, so only the last loaded hook would work. Instead of creating another addHook({ name: 'http' }, http => { I can move the logic to another file that will still be referenced in server.js addHook this will allow the wrapEmitForGcp to extend the other shimmer wrapEmit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. We could still add the hook in http/server.js, but delegate as much GCP specific logic as we can to a helper function in a separate file. Might be worth reaching out to #guild-dd-node-js to see if someone there could help with best practices.

Copy link
Author

@nina9753 nina9753 Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reached out in slack (apm-serverless too) for a review and http/server.js has a hook already so i just added my code to extend that hook

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way things typically work is that each integration is split into 2 components:

  1. The instrumentation, which is responsible for patching and collecting data, which it can then publish on channels. This part should be generic and contain no code specific to Datadog. I like to think of this as something that should be exposed by the library itself so that we wouldn't have to patch it in the first place, and Datadog specific code would never exist in the library itself (or a built-in module in this case)
  2. Plugins, which can then subscribe to the channels exposed by the instrumentation and handle Datadog specific things.

Usually there is only a single instrumentation since its responsibility is generic, and there can be multiple plugins (for example per product, or sometimes hierarchical for more complex integrations). So here, any information not already collected by the http instrumentation should be added but only that. Any actual logic should be moved to a new plugin that handles everything else.

@@ -26,6 +26,10 @@ addHook({
versions: ['>=1.4.0 <1.20.0']
}, read => {
return shimmer.wrapFunction(read, read => function (req, res, next) {
// Skip body parsing if PubSub/Cloud Event middleware already parsed it
if (req._pubsubBodyParsed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to know about PubSub specifically? Wouldn't generically checking if req.body exists be enough?

Comment on lines 22 to 27
function parseCloudEventMessage (json, req) {
// Eventarc only uses Binary Content Mode with ce-specversion header
// Payload structure: {"message": {...}, "subscription": "..."}
const message = json.message || json
const attrs = { ...message?.attributes }
const subscription = json.subscription || req.headers['ce-subscription'] || 'cloud-event-subscription'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way things typically work is that each integration is split into 2 components:

  1. The instrumentation, which is responsible for patching and collecting data, which it can then publish on channels. This part should be generic and contain no code specific to Datadog. I like to think of this as something that should be exposed by the library itself so that we wouldn't have to patch it in the first place, and Datadog specific code would never exist in the library itself (or a built-in module in this case)
  2. Plugins, which can then subscribe to the channels exposed by the instrumentation and handle Datadog specific things.

Usually there is only a single instrumentation since its responsibility is generic, and there can be multiple plugins (for example per product, or sometimes hierarchical for more complex integrations). So here, any information not already collected by the http instrumentation should be added but only that. Any actual logic should be moved to a new plugin that handles everything else.

const spanTags = {
component: 'google-cloud-pubsub',
'span.kind': 'consumer',
'gcloud.project_id': projectId || 'unknown',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually add a fallback when a value is not available. Leaving it undefined will simply not add the tag.

@nina9753 nina9753 requested a review from rochdev August 8, 2025 17:41
@rochdev
Copy link
Member

rochdev commented Aug 11, 2025

@nina9753 Should this be closed if it's superseded by #6260?

@nina9753 nina9753 closed this Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants