Skip to content

PostgreSQL Connection Terminated Unexpectedly During Cache Operations #167

@TEJASWINIRAMESH

Description

@TEJASWINIRAMESH

Description:
The GitHub Actions Cache Server is experiencing frequent PostgreSQL connection terminations during cache operations, resulting in 500 errors and failed cache requests. The error occurs across multiple operations including CreateCacheEntry and GetCacheEntryDownloadURL.

Using S3 and Postgres.

Error:
[request error[] [unhandled[] [POST] http:///twirp/github.actions.results.api.v1.CacheService/CreateCacheEntry
H3Error: Connection terminated unexpectedly
at /app/server/node_modules/pg-pool/index.js:45:11
... 8 lines matching cause stack trace ...
at async Object.reserveCache (file:///app/server/index.mjs:5048:13) {
cause: Error: Connection terminated unexpectedly
at /app/server/node_modules/pg-pool/index.js:45:11
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async PostgresDriver.acquireConnection (file:///app/server/node_modules/kysely/dist/esm/dialect/postgres/postgres-driver.js:21:24)
at async RuntimeDriver.acquireConnection (file:///app/server/node_modules/kysely/dist/esm/driver/runtime-driver.js:44:28)
at async DefaultConnectionProvider.provideConnection (file:///app/server/node_modules/kysely/dist/esm/driver/default-connection-provider.js:8:28)
at async DefaultQueryExecutor.executeQuery (file:///app/server/node_modules/kysely/dist/esm/query-executor/query-executor-base.js:34:16)
at async SelectQueryBuilderImpl.execute (file:///app/server/node_modules/kysely/dist/esm/query-builder/select-query-builder.js:317:24)
at async SelectQueryBuilderImpl.executeTakeFirst (file:///app/server/node_modules/kysely/dist/esm/query-builder/select-query-builder.js:321:26)
at async getUpload (file:///app/server/index.mjs:4667:15)
at async Object.reserveCache (file:///app/server/index.mjs:5048:13),
statusCode: 500,
fatal: false,
unhandled: true,
statusMessage: undefined,
data: undefined
}

Database Activity:
PostgreSQL audit logs show frequent DELETE operations on [uploads] and [upload_parts] tables, suggesting the cleanup processes are working, but connections are being terminated during regular operations.

DELETE FROM "uploads" WHERE "id" = $1
DELETE FROM ONLY "public"."upload_parts" WHERE $1 OPERATOR(pg_catalog.=) "upload_id"

Impact
Cache operations fail with 500 errors
GitHub Actions workflows experience cache misses
Service becomes unreliable under load

Helm config:
replicaCount: 2
service:
type: ClusterIP
port: 80

    resources:
      limits:
        cpu: 1500m
        memory: 3Gi
      requests:
        cpu: 1000m
        memory: 2.5Gi
    
    livenessProbe:
      httpGet:
        path: /
        port: cache
      initialDelaySeconds: 60
      timeoutSeconds: 30
      periodSeconds: 30
      failureThreshold: 5
    readinessProbe:
      httpGet:
        path: /
        port: cache
      initialDelaySeconds: 60
      timeoutSeconds: 30
      periodSeconds: 30
      failureThreshold: 5
    
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5
      targetCPUUtilizationPercentage: 70
      targetMemoryUtilizationPercentage: 70
      scaleDownStabilizationWindowSeconds: 600
    
    persistentVolumeClaim:
      enabled: true
      template:
        metadata:
          name: cache-data
          labels: {}
          annotations: {}
        spec:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 20Gi
          volumeMode: Filesystem
          storageClassName: efs-sc
    
    tmpVolume:
      ephemeral:
        volumeClaimTemplate:
          metadata:
            labels:
              type: github-actions-cache-server-tmp
          spec:
            accessModes:
              - ReadWriteMany
            storageClassName: efs-sc
            resources:
              requests:
                storage: 10Gi
      env: 
        name: ENABLE_DIRECT_DOWNLOADS
        value: "true"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions