Skip to content

OpenTelemetry exporter plugin, and built-in plugin provision#3014

Merged
vigoo merged 63 commits intomainfrom
otlp-plugin
Mar 19, 2026
Merged

OpenTelemetry exporter plugin, and built-in plugin provision#3014
vigoo merged 63 commits intomainfrom
otlp-plugin

Conversation

@vigoo
Copy link
Contributor

@vigoo vigoo commented Mar 18, 2026

Resolves #1613
Resolves #1614

Introduces a way to have built-in oplog processor plugins:

  • they are stored in the plugins dir of this repo, compiled with golem-cli
  • we store the compiled wasm in the repo too, so cargo install works for golem/golem-cli
  • they are embedded in the registry service binary
  • the built-in plugins are automatically deployed to the system account
  • every new environment automatically gets grants to use them

Adds the first such built-in plugin, the OpenTelemetry exporter:

  • supports traces, logs and metrics
  • tested with dockerized otel-collector and jaeger

Follow-up: #3015

@vigoo
Copy link
Contributor Author

vigoo commented Mar 18, 2026

One thing I forgot to check / implement: how do we provision changed / updated built-in plugins when deploying to prod (but keeping the live db)

@vigoo
Copy link
Contributor Author

vigoo commented Mar 18, 2026

One thing I forgot to check / implement: how do we provision changed / updated built-in plugins when deploying to prod (but keeping the live db)

The single line change I've pushed that makes plugins versioned by the golem version fixes this, I think

}
};

// 3. Upload "otlp-exporter" component, or update it if it already exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the component / wasm hash here to prevent updating the component on each startup of each instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that broke in the last commit, previously it was supposed to be named by the version (but that has other issues), thanks

.await
{
Ok(()) => {
// Enqueue succeeded — immediately confirm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the receiver side protects against receiving the same batch multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all the "exactly once semantics" changes were done in a separate PR before this. Here I just added this extra logging of process failures back to the caller agent's event stream.
(For double send, it's solved by having deterministic oplog processor worker name + determinstic idempotency keys)


let auth_service = Arc::new(AuthService::new(repos.account_repo.clone()));

let root_account_id = config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a dedicated account for this without special roles. I don't want to leak the admin account id to users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By leaking, you mean that they can see the ID because of the plugin grants?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly

Comment on lines +262 to +300
let apps = application_service
.list_in_account(root_account_id, &auth)
.await
.map_err(|e| anyhow::anyhow!("Failed to list applications: {e}"))?;

for app_entry in &apps {
let envs = match environment_service
.list_in_application(app_entry.id, &auth)
.await
{
Ok(envs) => envs,
Err(e) => {
tracing::warn!(
"Failed to list environments for app {}: {e}",
app_entry.name
);
continue;
}
};

for env_entry in &envs {
match environment_plugin_grant_service
.create(
env_entry.id,
EnvironmentPluginGrantCreation {
plugin_registration_id: plugin.id,
},
&auth,
)
.await
{
Ok(_) => {}
Err(EnvironmentPluginGrantError::GrantForPluginAlreadyExists) => {}
Err(e) => {
tracing::warn!("Failed to grant plugin to env {}: {e}", env_entry.name);
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block only grants access to environments in the root account's applications, which is not useful.

There should not be any applications or environments in that account (outside the one used for the plugin definition)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there supposed to be grants created for the built-in plugins whenever someone creates a new environment. i'll double check and remove this unnecessary part

let plugin_grants: Vec<EnvironmentPluginGrantRecord> = builtin_plugins
.into_iter()
.map(|p| {
EnvironmentPluginGrantRecord::creation(
Copy link
Contributor

@mschuwalow mschuwalow Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvironmentPluginGrants can be managed by users and deleted as long as they have access to the environment. I don't think we should allow deletion of builtin plugins from environments.
Additionally, these grants will show up in the api and cli for users.

I think builtin plugins want to have special handling and not reuse exactly the same mechanism as user plugins. A couple of options that I think make sense:

  • Do not give builtin plugins an id at all, but rather have an enum for builtin plugins. Whereever we can refer to a plugin we allow doing it either by id or by selecting one of the builtin ones
  • Store them in the db with an id like now, but either have a dedicated table for them or extend the current table such that they can live outside of the normal account / application / environment hierarchy. All queries that retrieve plugins can join with the buitin plugins. In that case we do not have an automatically generated account for them at all and can use the system account id wherever we do want to display an owner (clearly marking them as system).

Alternatively, if we do keep the current system managed account + grant structure, we should protect the automatically created grants from deletion and make them more descriptive for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather keep it simple - that's why I wanted to build it on top of the existing things. They are not special in any way other than we want them to be available for everyone without explicitly installing them.

I think if we move them to another root account as you asked, and maybe disallow deleting the grants, we have the desired UX and we have minimal complexity on the implementation side (for me the provision step and auto-granting the plugins feels very nice and scoped)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me 👍

id: AccountId(uuid!("adb2694f-cd9f-425d-905d-ca2888c9c5de")),
name: "Builtin Plugin Owner".to_string(),
email: AccountEmail("builtin-plugin-owner@golem.cloud".to_string()),
token: TokenSecret::trusted("32d6072d-64e9-4a4a-b8f9-fadf68bb446b".to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer adjusting this so we don't create a token for this account, as it should never be used interactively. But that can easily be done in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

name: "Builtin Plugin Owner".to_string(),
email: AccountEmail("builtin-plugin-owner@golem.cloud".to_string()),
token: TokenSecret::trusted("32d6072d-64e9-4a4a-b8f9-fadf68bb446b".to_string()),
role: AccountRole::BuiltinPluginOwner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about having a dedicated "marker" role. So far the roles were actually used for permission checks, but I'm not strongly against it.


#[derive(Clone, Debug, Serialize, Deserialize, Default)]
pub struct BuiltinPluginsConfig {
pub enabled: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum would be more consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do in a next pr

@vigoo vigoo merged commit 3e2695a into main Mar 19, 2026
82 of 84 checks passed
@vigoo vigoo deleted the otlp-plugin branch March 19, 2026 17:36
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test the OpenTelemetry plugin Create an OpenTelemetry oplog processor plugin for golem

2 participants