Skip to content

Conversation

@fangpenlin
Copy link
Contributor

@fangpenlin fangpenlin commented Dec 20, 2025

Description 📣

ref: https://linear.app/infisical/issue/PLATFRM-134/make-a-smarter-docker-compose-cache-system-to-avoid-recreating-compose

based on: #91

Tests 🛠️

The way you test this PR is mostly like the #91. The major difference is that now we have a cache system for the docker-compose. If we found a compose (multiple containers with a compose label) already meets the spec, we will not create a new one but reuse it instead. The purpose is to speed up development cycle.

To ensure this system works, you can run the only relay test case we have right now with TESTCONTAINERS_RYUK_DISABLED=true. This will stop the Ryuk container from cleaning up our compose containers. After successfully running the relay test case, you can then run again to see how long it takes and if it reuses the existing container.

After that, you can also set CLI_E2E_DISABLE_COMPOSE_CACHE=1 to disable the cache and you should see it boot the container from the ground up instead of using the cache.

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

@fangpenlin fangpenlin changed the base branch from main to PLATFRM-125-e2e-test-poc December 20, 2025 05:08
@fangpenlin fangpenlin force-pushed the PLATFRM-134-reuse-docker-compose-for-e2e-tests branch from 78a567c to f508d06 Compare January 6, 2026 02:48
@gitguardian
Copy link

gitguardian bot commented Jan 6, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9387833 Triggered Generic Password 22fcc91 e2e/packages/client/reset_db.go View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@fangpenlin fangpenlin force-pushed the PLATFRM-125-e2e-test-poc branch 3 times, most recently from d6f09f0 to 7fa0723 Compare January 6, 2026 19:56
Base automatically changed from PLATFRM-125-e2e-test-poc to main January 7, 2026 17:29
@fangpenlin fangpenlin force-pushed the PLATFRM-134-reuse-docker-compose-for-e2e-tests branch from f508d06 to 22fcc91 Compare January 7, 2026 23:49
@fangpenlin fangpenlin marked this pull request as ready for review January 10, 2026 04:04
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 10, 2026

Greptile Overview

Greptile Summary

This PR introduces a smart Docker Compose caching system to avoid recreating containers for E2E tests, along with new database and Redis reset utilities.

Key Changes

Container Reuse System (compose.go):

  • Implements SHA1-based naming to identify identical compose configurations
  • Checks for existing running containers and reuses them instead of recreating
  • Introduces RunningCompose wrapper to handle pre-existing containers
  • Refactors Stack API to directly manage compose lifecycle

Reset Utilities (reset_db.go, reset_redis.go, reset.go):

  • New PostgreSQL reset function that truncates all tables while preserving migrations
  • New Redis reset function using FlushAll
  • Integration layer to reset both services via compose stack

Test Improvements (helpers_test.go):

  • Added Reset() and ResetAndBootstrap() methods for test isolation
  • Separated Bootstrap logic from Up for better reusability
  • Cleaner API with direct Stack integration

Critical Security Issue

SQL Injection Vulnerability in reset_db.go (line 125): Table names from information_schema are directly interpolated into TRUNCATE statements without proper identifier escaping. While this is test code and the risk is lower, a malicious table name could execute arbitrary SQL. This must be fixed by using proper identifier quoting.

Other Issues Found

  1. Context misuse in compose.go (line 48): Uses context.Background() instead of passed context
  2. Panic usage: RunningCompose uses panic instead of returning errors
  3. Race condition potential: Container state checking has TOCTOU window
  4. Cleanup timing: t.Cleanup() registered after operations that could fail

Architecture Assessment

The caching approach is sound and will significantly speed up test execution by reusing containers. The reset mechanism properly isolates tests while avoiding the overhead of full container recreation. However, the SQL injection issue needs immediate attention before merging.

Confidence Score: 2/5

  • This PR contains a critical SQL injection vulnerability that must be fixed before merging
  • Score reflects the CRITICAL security issue in reset_db.go (SQL injection via unescaped table identifiers) plus the context misuse bug in compose.go. While this is test code and the architecture is sound, the SQL injection represents a significant security risk that prevents approval. The other issues (panic usage, race conditions, cleanup timing) are less critical but should also be addressed.
  • e2e/packages/client/reset_db.go requires immediate attention for SQL injection fix (line 125). e2e/packages/infisical/compose.go needs context bug fix (line 48).

Important Files Changed

File Analysis

Filename Score Overview
e2e/packages/infisical/compose.go 3/5 Major refactor adding Docker container reuse logic with SHA1-based naming. Contains context misuse bug (line 48), potential race condition in container state checking, and panic usage instead of error returns. Core caching functionality appears sound.
e2e/packages/infisical/reset.go 4/5 New helper to reset Infisical services by calling client reset functions. Clean implementation with proper error handling and context propagation.
e2e/packages/client/reset_db.go 1/5 New database reset utility with CRITICAL SQL injection vulnerability (line 125) due to unescaped table identifiers. Also has potential conflict issues with hardcoded super_admin insertion. Requires immediate security fix.
e2e/packages/client/reset_redis.go 4/5 New Redis reset utility using FlushAll to clear all databases. Clean implementation with proper connection handling. FlushAll scope (all DBs vs single DB) should be clarified in comments.
e2e/relay/helpers_test.go 4/5 Refactored to use new Stack API and added Reset/ResetAndBootstrap methods. Cleanup registration timing could be improved but overall changes are sound and improve test isolation.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 61 to 99
if len(containers) > 0 {
services := make([]string, 0, len(s.Project.Services))
for name := range s.Project.Services {
services = append(services, name)
}

missingServices := make(map[string]int, len(services))
for _, service := range services {
missingServices[service] = 1
}
for _, c := range containers {
if c.State == container.StateRunning {
serviceName, ok := c.Labels[api.ServiceLabel]
if !ok {
continue
}
_, ok = missingServices[serviceName]
if ok {
delete(missingServices, serviceName)
}
}
}

if len(missingServices) == 0 {
provider, err := testcontainers.NewDockerProvider(testcontainers.WithLogger(log.Default()))
if err != nil {
return err
}
s.dockerCompose = &RunningCompose{
name: uniqueName,
client: dockerClient,
provider: provider,
services: services,
containers: make(map[string]*testcontainers.DockerContainer),
}
slog.Info("Found existing running containers", "name", uniqueName)
// Found existing compose, reuse instead
return s.dockerCompose.Up(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Race Condition in Container State Check

The code checks if containers exist and are running, then decides to reuse them. However, there's a time-of-check to time-of-use (TOCTOU) race condition here. Between checking the container state (lines 61-82) and actually using it (line 98), the containers could:

  1. Stop running
  2. Be removed by another process
  3. Change state

While this is test code and the race window is small, in concurrent test scenarios this could lead to flaky tests where containers appear to exist but fail to respond.

Recommendation: Consider adding retry logic or additional state validation when attempting to reuse existing containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache system is not meant for CI test or any form of parallel way of running yet. For now, the main focus is on speeding up the development cycle of writing the test. Because otherwise we will waste tons of time in just waiting the compose container to boot up. If we really care about it, we will either set CLI_E2E_DISABLE_COMPOSE_CACHE=1 to disable this feature on CI or add a lock to it.

Comment on lines +287 to +296
panic("Cannot modify running compose")
}

func (c *RunningCompose) WithEnv(m map[string]string) compose.ComposeStack {
panic("Cannot modify running compose")
}

func (c *RunningCompose) WithOsEnv() compose.ComposeStack {
panic("Cannot modify running compose")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Using Panic Instead of Error Return

These methods use panic() to indicate that operations are not supported on running compose instances. While this makes the intent clear, panicking in library code is generally considered poor practice as it prevents callers from gracefully handling the error.

Recommendation: Consider returning an error instead:

Suggested change
panic("Cannot modify running compose")
}
func (c *RunningCompose) WithEnv(m map[string]string) compose.ComposeStack {
panic("Cannot modify running compose")
}
func (c *RunningCompose) WithOsEnv() compose.ComposeStack {
panic("Cannot modify running compose")
}
func (c *RunningCompose) WaitForService(s string, strategy wait.Strategy) compose.ComposeStack {
// Cannot modify running compose - return self to maintain interface compatibility
return c
}
func (c *RunningCompose) WithEnv(m map[string]string) compose.ComposeStack {
// Cannot modify running compose - return self to maintain interface compatibility
return c
}
func (c *RunningCompose) WithOsEnv() compose.ComposeStack {
// Cannot modify running compose - return self to maintain interface compatibility
return c
}

Or if you want to preserve the error behavior, log a warning instead of panicking since these methods are part of the ComposeStack interface and may be called by library code that doesn't expect panics.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this library is for test. We don't want to handle the error, cashing the app is the easy way to tell the developer that you shouldn't do that. There's no need for recovery from that kind of error anyway, and this is not code for production anyway.

if _, ok := opts.SkipTables[table]; ok {
continue
}
builder.WriteString(fmt.Sprintf("TRUNCATE TABLE %s RESTART IDENTITY CASCADE;\n", table))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 CRITICAL: SQL Injection Vulnerability - Unescaped Table Identifiers

This line directly interpolates table names from the database into a SQL TRUNCATE statement without proper identifier quoting. While table names come from information_schema, a malicious actor who can create tables with special characters in their names could potentially inject SQL.

Attack Scenario:
If an attacker can create a table named something like: myTable; DROP DATABASE infisical; --, this could execute arbitrary SQL commands.

Example of vulnerability:

table := `myTable; DROP TABLE important_table; --`
// Results in: TRUNCATE TABLE myTable; DROP TABLE important_table; -- RESTART IDENTITY CASCADE;

Fix: Use PostgreSQL identifier quoting with pgx.Identifier or properly escape identifiers:

Suggested change
builder.WriteString(fmt.Sprintf("TRUNCATE TABLE %s RESTART IDENTITY CASCADE;\n", table))
builder.WriteString(fmt.Sprintf("TRUNCATE TABLE %s RESTART IDENTITY CASCADE;\n", pgx.Identifier{schema, table}.Sanitize()))

Or use double quotes for identifier escaping:

// Escape table names properly
escapedTable := strings.ReplaceAll(table, `"`, `""`)
builder.WriteString(fmt.Sprintf(`TRUNCATE TABLE "%s" RESTART IDENTITY CASCADE;`, escapedTable) + "\n")

Security Impact: HIGH - Could allow arbitrary SQL execution in test environments, potentially affecting other tests or leaking data.

Copy link
Contributor Author

@fangpenlin fangpenlin Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, hope, this is an e2e test running with a one time use database in a container. there's zero value to SQL inject it, not to mention the table name is from the db itself instead of user input. you have no idea what you're talking about, it's just the stats making you say that it looks like a SQL injection attack vector unfortunately 😅

slog.Info("Connected to Redis", "pong", pong)

// Clear all keys in the current database
err = rdb.FlushAll(ctx).Err()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ FlushAll Affects All Redis Databases

The code uses FlushAll() which clears all databases in the Redis instance (databases 0-15 by default), not just the currently selected one. This could be problematic if:

  1. Multiple test suites run in parallel and use different Redis database numbers
  2. The Redis instance is shared with other applications during development

Consider: Using FlushDB() instead to only flush the current database, or explicitly select a specific database number before flushing:

// Flush only the current database (safer for shared Redis)
err = rdb.FlushDB(ctx).Err()

Or if you intentionally want to clear all databases (current behavior is likely correct for e2e tests):

// FlushAll clears all databases - this is intentional for e2e test reset
err = rdb.FlushAll(ctx).Err()

Add a comment to clarify the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I don't need to add a comment. The intention is very clear: reset_redis. And this is under e2e folder / package, it's obviously for e2e test to reset the redis db.

Comment on lines +139 to +141
_, err = conn.Exec(ctx,
`INSERT INTO public.super_admin ("id", "fipsEnabled", "initialized", "allowSignUp") VALUES ($1, $2, $3, $4)`,
"00000000-0000-0000-0000-000000000000", true, false, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Hardcoded ID May Cause Conflicts

The code inserts a hardcoded all-zeros identifier for the super_admin record. While this is consistent and deterministic for testing, there are potential issues:

  1. If the truncate operation fails for the super_admin table (due to it being in SkipTables), this INSERT will fail with a unique constraint violation
  2. The hardcoded values may not match what a real initialization would create

Recommendation: Consider using ON CONFLICT DO UPDATE or ON CONFLICT DO NOTHING to make this operation idempotent and prevent failures on repeated calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, if the truncate failed for whatever reason, the whole test should just fail. And hope, the super admin is exactly like that using this uuid. the quality of review is really bad this time unfortunately

@fangpenlin fangpenlin requested a review from akhilmhdh January 10, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants