Skip to content

build: upgrade engine #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions charts/cf-runtime/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v2
description: A Helm chart for Codefresh Runner
name: cf-runtime
version: 8.1.0
version: 8.2.0
keywords:
- codefresh
- runner
Expand All @@ -17,8 +17,18 @@ annotations:
artifacthub.io/containsSecurityUpdates: "false"
# Supported kinds: `added`, `changed`, `deprecated`, `removed`, `fixed`, `security`:
artifacthub.io/changes: |
- kind: changed
description: "Update \"engine\" to version 1.179.1."
- kind: added
description: "Added MAXIMUM_POST_STEPS_GRACE_PERIOD_MINUTES configuration for engine which controls maximum time for internal build chores before termination."
description: "Add support for OpenTelemetry signals: metrics, logs, traces."
- kind: added
description: "Add support for Pyroscope profiles."
- kind: changed
description: "Redesign \"engine\" metrics to follow OpenTelemetry standards and provide more comprehensive insights about Classic Build execution. Please read upgrade notes for more details."
- kind: deprecated
description: "Deprecate legacy Prometheus metrics in favor of new OpenTelemetry metrics in \"engine\". Please read upgrade notes for more details."
- kind: changed
description: "Improve observability of build's \"Initializing Process\" step by providing more logs and more detailed status of the step."
dependencies:
- name: cf-common
repository: oci://quay.io/codefresh/charts
Expand Down
70 changes: 58 additions & 12 deletions charts/cf-runtime/README.md

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions charts/cf-runtime/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Helm chart for deploying [Codefresh Runner](https://codefresh.io/docs/docs/insta
- [To 7.x](#to-7-x)
- [To 7.9.x](#to-7-9-x)
- [To 8.x](#to-8-x)
- [To 8.2.x](#to-8-2-x)
- [Architecture](#architecture)
- [Configuration](#configuration)
- [EBS backend volume configuration in AWS](#ebs-backend-volume-configuration)
Expand Down Expand Up @@ -313,6 +314,29 @@ This means that any existing images in your pipelines that were created using th

To avoid operation disruption, you have to identify and convert such deprecated images to modern formats. Tutorial: [https://codefresh.io/docs/docs/kb/articles/upgrade-deprecated-docker-images/](https://codefresh.io/docs/docs/kb/articles/upgrade-deprecated-docker-images/)

### To 8.2.x

⚠️⚠️⚠️ **BREAKING CHANGE in metrics configuration** ⚠️⚠️⚠️

In this release, the `engine` component has migrated its metrics collection to OpenTelemetry, using the *push* model by default.

You can still switch to the *pull* model by setting the `OTEL_METRICS_EXPORTER=prometheus` environment variable for the `engine`. However, we recommend using the default configuration, as it is better suited for the short-lived nature of Classic Builds and provides more precise and complete metrics.

View [default chart values](https://artifacthub.io/packages/helm/codefresh-runner/cf-runtime?modal=values&path=runtime.engine.env) for more configuration options.

The `engine` metrics have also been redesigned to follow OpenTelemetry standards and to deliver more actionable insights. Full list of metrics: https://codefresh.io/docs/docs/installation/runner/classic-runtime-monitoring/

For a smooth transition, the previous Prometheus metrics are still available but are now disabled by default. **These legacy metrics will be removed in future releases.** If you need to temporarily retain the old metrics, add the following values to your chart configuration:

```yaml
runtime:
engine:
env:
CF_TELEMETRY_PROMETHEUS_ENABLE: "false" # Disable new Prometheus metrics to avoid ports conflict and data duplication
CF_TELEMETRY_OTEL_ENABLE: "false" # Disable new OTel metrics to avoid data duplication
METRICS_PROMETHEUS_ENABLED: "true" # Enable old Prometheus metrics
```

## Architecture

[Codefresh Runner architecture](https://codefresh.io/docs/docs/installation/codefresh-runner/#codefresh-runner-architecture)
Expand Down
18 changes: 16 additions & 2 deletions charts/cf-runtime/templates/runtime/runtime-env-spec-tmpl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@
{{- if $runtimeImageRegistry }}
{{- $_ := set $rootContext.Values.global "imageRegistry" $runtimeImageRegistry }}
{{- end }}
{{- $runtimeVersion := coalesce .Values.version .Chart.Version -}}
{{- $runtimeName := include "runtime.runtime-environment-spec.runtime-name" . -}}
{{- $engineVersion := coalesce $engineContext.image.tag "latest" -}}
{{- if $engineContext.image.digest }}
{{- $engineVersion = printf "%s@%s" $engineVersion $engineContext.image.digest -}}
{{- end }}
{{- $dindVersion := coalesce $dindContext.image.tag "latest" -}}
{{- if $dindContext.image.digest }}
{{- $dindVersion = printf "%s@%s" $dindVersion $dindContext.image.digest -}}
{{- end }}
metadata:
name: {{ include "runtime.runtime-environment-spec.runtime-name" . }}
agent: {{ .Values.runtime.agent }}
Expand Down Expand Up @@ -102,7 +112,10 @@ runtimeScheduler:
{{- else }}
COSIGN_IMAGE_SIGNER_IMAGE: {{ include (printf "%s.image.name" $cfCommonTplSemver ) (dict "image" (index $engineContext "runtimeImages" "cosign-image-signer") "context" $rootContext) | squote }}
{{- end }}
RUNTIME_CHART_VERSION: {{ coalesce .Values.version .Chart.Version }}
RUNTIME_CHART_VERSION: {{ $runtimeVersion }}
CF_SERVICE_NAME: {{ printf "cf-classic-engine" }}
CF_SERVICE_VERSION: {{ $engineVersion }}
OTEL_RESOURCE_ATTRIBUTES: {{ printf "service.name=cf-classic-engine,service.version=%s,service.namespace=cf-classic-runtime,cf.classic.runtime.name=%s,cf.classic.runtime.version=%s" $engineVersion $runtimeName $runtimeVersion }}
{{- with $engineContext.userEnvVars }}
userEnvVars: {{- toYaml . | nindent 4 }}
{{- end }}
Expand Down Expand Up @@ -162,12 +175,13 @@ dockerDaemonScheduler:
{{- with $dindContext.userAccess }}
userAccess: {{ . }}
{{- end }}
{{- with $dindContext.env }}
envVars:
{{- with $dindContext.env }}
{{- range $key, $val := . }}
{{ $key }}: {{ $val | squote }}
{{- end }}
{{- end }}
OTEL_RESOURCE_ATTRIBUTES: {{ printf "service.name=cf-classic-dind,service.version=%s,service.namespace=cf-classic-runtime,cf.classic.runtime.name=%s,cf.classic.runtime.version=%s" $dindVersion $runtimeName $runtimeVersion }}
cluster:
namespace: {{ .Release.Namespace }}
serviceAccount: {{ $dindContext.serviceAccount }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,39 @@ tests:
- run
- start
envVars:
CF_TELEMETRY_LOGS_LEVEL: 'debug'
CF_TELEMETRY_OTEL_ALLOW_HTTP_INSTRUMENTATION: 'false'
CF_TELEMETRY_OTEL_ENABLE: 'true'
CF_TELEMETRY_PROMETHEUS_ENABLE: 'false'
CF_TELEMETRY_PROMETHEUS_ENABLE_PROCESS_METRICS: 'false'
CF_TELEMETRY_PROMETHEUS_HOST: '0.0.0.0'
CF_TELEMETRY_PROMETHEUS_PORT: '9100'
CF_TELEMETRY_PYROSCOPE_ENABLE: 'false'
CONTAINER_LOGGER_EXEC_CHECK_INTERVAL_MS: '1000'
DOCKER_REQUEST_TIMEOUT_MS: '30000'
FORCE_COMPOSE_SERIAL_PULL: 'false'
LOGGER_LEVEL: 'debug'
LOG_OUTGOING_HTTP_REQUESTS: 'false'
METRICS_PROMETHEUS_COLLECT_PROCESS_METRICS: 'false'
METRICS_PROMETHEUS_ENABLED: 'true'
METRICS_PROMETHEUS_ENABLED: 'false'
METRICS_PROMETHEUS_ENABLE_LEGACY_METRICS: 'false'
METRICS_PROMETHEUS_HOST: '0.0.0.0'
METRICS_PROMETHEUS_PORT: '9100'
METRICS_PROMETHEUS_SCRAPE_TIMEOUT: '15000'
METRICS_SCRAPE_TIMEOUT_MS: '0'
OTEL_EXPORTER_OTLP_COMPRESSION: 'gzip'
OTEL_EXPORTER_OTLP_ENDPOINT: 'http://localhost:4317'
OTEL_EXPORTER_OTLP_PROTOCOL: 'grpc'
OTEL_EXPORTER_PROMETHEUS_HOST: '0.0.0.0'
OTEL_EXPORTER_PROMETHEUS_PORT: '9464'
OTEL_LOGS_EXPORTER: 'none'
OTEL_METRICS_EXPORTER: 'otlp'
OTEL_METRIC_EXPORT_INTERVAL: '10000'
OTEL_METRIC_EXPORT_TIMEOUT: '5000'
OTEL_SEMCONV_STABILITY_OPT_IN: 'http'
OTEL_TRACES_EXPORTER: 'none'
OTEL_TRACES_SAMPLER: 'parentbased_always_on'
PYROSCOPE_SERVER_ADDRESS: ''
TRUSTED_QEMU_IMAGES: 'tonistiigi/binfmt'
COMPOSE_IMAGE: 'somedomain.io/codefresh/compose:tagoverride'
CONTAINER_LOGGER_IMAGE: 'somedomain.io/codefresh/cf-container-logger:tagoverride'
Expand All @@ -59,6 +81,9 @@ tests:
GC_BUILDER_IMAGE: 'somedomain.io/codefresh/cf-gc-builder:tagoverride'
COSIGN_IMAGE_SIGNER_IMAGE: 'somedomain.io/codefresh/cf-cosign-image-signer:tagoverride'
RUNTIME_CHART_VERSION: 1.0.0
CF_SERVICE_NAME: cf-classic-engine
CF_SERVICE_VERSION: tagoverride
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-engine,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=my-context/codefresh,cf.classic.runtime.version=1.0.0
workflowLimits:
MAXIMUM_ALLOWED_TIME_BEFORE_PRE_STEPS_SUCCESS: 600
MAXIMUM_ALLOWED_WORKFLOW_AGE_BEFORE_TERMINATION: 86400
Expand Down Expand Up @@ -89,6 +114,8 @@ tests:
dindImage: 'somedomain.io/codefresh/dind:tagoverride'
imagePullPolicy: IfNotPresent
userAccess: true
envVars:
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-dind,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=my-context/codefresh,cf.classic.runtime.version=1.0.0
cluster:
namespace: codefresh
serviceAccount: codefresh-engine
Expand Down
56 changes: 54 additions & 2 deletions charts/cf-runtime/tests/runtime/runtime_onprem_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ tests:
- two
- three
envVars:
CF_TELEMETRY_LOGS_LEVEL: 'debug'
CF_TELEMETRY_OTEL_ALLOW_HTTP_INSTRUMENTATION: 'false'
CF_TELEMETRY_OTEL_ENABLE: 'true'
CF_TELEMETRY_PROMETHEUS_ENABLE: 'false'
CF_TELEMETRY_PROMETHEUS_ENABLE_PROCESS_METRICS: 'false'
CF_TELEMETRY_PROMETHEUS_HOST: '0.0.0.0'
CF_TELEMETRY_PROMETHEUS_PORT: '9100'
CF_TELEMETRY_PYROSCOPE_ENABLE: 'false'
CONTAINER_LOGGER_EXEC_CHECK_INTERVAL_MS: '1000'
DOCKER_REQUEST_TIMEOUT_MS: '30000'
FLOAT_AS_STRING: '12.34'
Expand All @@ -50,11 +58,25 @@ tests:
LOGGER_LEVEL: 'debug'
LOG_OUTGOING_HTTP_REQUESTS: 'false'
METRICS_PROMETHEUS_COLLECT_PROCESS_METRICS: 'false'
METRICS_PROMETHEUS_ENABLED: 'true'
METRICS_PROMETHEUS_ENABLED: 'false'
METRICS_PROMETHEUS_ENABLE_LEGACY_METRICS: 'false'
METRICS_PROMETHEUS_HOST: '0.0.0.0'
METRICS_PROMETHEUS_PORT: '9100'
METRICS_PROMETHEUS_SCRAPE_TIMEOUT: '15000'
METRICS_SCRAPE_TIMEOUT_MS: '0'
OTEL_EXPORTER_OTLP_COMPRESSION: 'gzip'
OTEL_EXPORTER_OTLP_ENDPOINT: 'http://localhost:4317'
OTEL_EXPORTER_OTLP_PROTOCOL: 'grpc'
OTEL_EXPORTER_PROMETHEUS_HOST: '0.0.0.0'
OTEL_EXPORTER_PROMETHEUS_PORT: '9464'
OTEL_LOGS_EXPORTER: 'none'
OTEL_METRICS_EXPORTER: 'otlp'
OTEL_METRIC_EXPORT_INTERVAL: '10000'
OTEL_METRIC_EXPORT_TIMEOUT: '5000'
OTEL_SEMCONV_STABILITY_OPT_IN: 'http'
OTEL_TRACES_EXPORTER: 'none'
OTEL_TRACES_SAMPLER: 'parentbased_always_on'
PYROSCOPE_SERVER_ADDRESS: ''
TRUSTED_QEMU_IMAGES: 'tonistiigi/binfmt'
COMPOSE_IMAGE: 'quay.io/codefresh/compose:tagoverride'
CONTAINER_LOGGER_IMAGE: 'quay.io/codefresh/cf-container-logger:tagoverride'
Expand All @@ -72,6 +94,9 @@ tests:
GC_BUILDER_IMAGE: 'quay.io/codefresh/cf-gc-builder:tagoverride'
COSIGN_IMAGE_SIGNER_IMAGE: 'quay.io/codefresh/cf-cosign-image-signer:tagoverride'
RUNTIME_CHART_VERSION: 1.0.0
CF_SERVICE_NAME: cf-classic-engine
CF_SERVICE_VERSION: tagoverride
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-engine,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=system/my-runtime,cf.classic.runtime.version=1.0.0
workflowLimits:
MAXIMUM_ALLOWED_TIME_BEFORE_PRE_STEPS_SUCCESS: 600
MAXIMUM_ALLOWED_WORKFLOW_AGE_BEFORE_TERMINATION: 86400
Expand Down Expand Up @@ -123,6 +148,7 @@ tests:
ALICE: 'BOB'
FLOAT_AS_STRING: '12.34'
INT: '123'
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-dind,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=system/my-runtime,cf.classic.runtime.version=1.0.0
cluster:
namespace: codefresh
serviceAccount: service-account-override
Expand Down Expand Up @@ -228,6 +254,14 @@ tests:
- two
- three
envVars:
CF_TELEMETRY_LOGS_LEVEL: 'debug'
CF_TELEMETRY_OTEL_ALLOW_HTTP_INSTRUMENTATION: 'false'
CF_TELEMETRY_OTEL_ENABLE: 'true'
CF_TELEMETRY_PROMETHEUS_ENABLE: 'false'
CF_TELEMETRY_PROMETHEUS_ENABLE_PROCESS_METRICS: 'false'
CF_TELEMETRY_PROMETHEUS_HOST: '0.0.0.0'
CF_TELEMETRY_PROMETHEUS_PORT: '9100'
CF_TELEMETRY_PYROSCOPE_ENABLE: 'false'
CONTAINER_LOGGER_EXEC_CHECK_INTERVAL_MS: '1000'
DOCKER_REQUEST_TIMEOUT_MS: '30000'
FLOAT_AS_STRING: '12.34'
Expand All @@ -237,11 +271,25 @@ tests:
LOGGER_LEVEL: 'debug'
LOG_OUTGOING_HTTP_REQUESTS: 'false'
METRICS_PROMETHEUS_COLLECT_PROCESS_METRICS: 'false'
METRICS_PROMETHEUS_ENABLED: 'true'
METRICS_PROMETHEUS_ENABLED: 'false'
METRICS_PROMETHEUS_ENABLE_LEGACY_METRICS: 'false'
METRICS_PROMETHEUS_HOST: '0.0.0.0'
METRICS_PROMETHEUS_PORT: '9100'
METRICS_PROMETHEUS_SCRAPE_TIMEOUT: '15000'
METRICS_SCRAPE_TIMEOUT_MS: '0'
OTEL_EXPORTER_OTLP_COMPRESSION: 'gzip'
OTEL_EXPORTER_OTLP_ENDPOINT: 'http://localhost:4317'
OTEL_EXPORTER_OTLP_PROTOCOL: 'grpc'
OTEL_EXPORTER_PROMETHEUS_HOST: '0.0.0.0'
OTEL_EXPORTER_PROMETHEUS_PORT: '9464'
OTEL_LOGS_EXPORTER: 'none'
OTEL_METRICS_EXPORTER: 'otlp'
OTEL_METRIC_EXPORT_INTERVAL: '10000'
OTEL_METRIC_EXPORT_TIMEOUT: '5000'
OTEL_SEMCONV_STABILITY_OPT_IN: 'http'
OTEL_TRACES_EXPORTER: 'none'
OTEL_TRACES_SAMPLER: 'parentbased_always_on'
PYROSCOPE_SERVER_ADDRESS: ''
TRUSTED_QEMU_IMAGES: 'tonistiigi/binfmt'
COMPOSE_IMAGE: 'quay.io/codefresh/compose:tagoverride'
CONTAINER_LOGGER_IMAGE: 'quay.io/codefresh/cf-container-logger:tagoverride'
Expand All @@ -259,6 +307,9 @@ tests:
GC_BUILDER_IMAGE: 'quay.io/codefresh/cf-gc-builder:tagoverride'
COSIGN_IMAGE_SIGNER_IMAGE: 'quay.io/codefresh/cf-cosign-image-signer:tagoverride'
RUNTIME_CHART_VERSION: 1.0.0
CF_SERVICE_NAME: cf-classic-engine
CF_SERVICE_VERSION: tagoverride
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-engine,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=system/default-override,cf.classic.runtime.version=1.0.0
workflowLimits:
MAXIMUM_ALLOWED_TIME_BEFORE_PRE_STEPS_SUCCESS: 600
MAXIMUM_ALLOWED_WORKFLOW_AGE_BEFORE_TERMINATION: 86400
Expand Down Expand Up @@ -310,6 +361,7 @@ tests:
ALICE: 'BOB'
FLOAT_AS_STRING: '12.34'
INT: '123'
OTEL_RESOURCE_ATTRIBUTES: service.name=cf-classic-dind,service.version=tagoverride,service.namespace=cf-classic-runtime,cf.classic.runtime.name=system/default-override,cf.classic.runtime.version=1.0.0
cluster:
namespace: codefresh
serviceAccount: service-account-override
Expand Down
Loading