Skip to content

feat: add backing store for disk buffering of events#273

Draft
notque wants to merge 22 commits intomasterfrom
audit_backing_store
Draft

feat: add backing store for disk buffering of events#273
notque wants to merge 22 commits intomasterfrom
audit_backing_store

Conversation

@notque
Copy link
Contributor

@notque notque commented Nov 19, 2025

an attempt to add disk buffering of events to make the rabbitmq connection non-blocking with disk backed storage to hold if it is down. there was a single downtime this year where the network connection from scaleout had an issue, and a service was down due to it.

we cannot lose events, so we currently just stop the service. but we also have availability requirements.

thus the idea is we add pvcs to services, and hold events there with somewhat largely configured settings to withstand any connectivity issues. i do have the defaults set somewhat small at the moment.

you may dislike this, and have a significantly better plan. i'd love to hear it. i just gave it a shot.

This comment was marked as outdated.

@notque notque force-pushed the audit_backing_store branch from 6e065d1 to 8d68fb5 Compare November 19, 2025 06:44
Copy link
Contributor

@majewsky majewsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealing with filesystem volumes is going to be a major complication for all of my services, where I do not have volumes at all. I have already considered writing audit events into the DB until they can be submitted, so if we introduce a caching capability at this level, I would like for it to be able to support a DB as backing store, too.

I can see that you have type BackingStore as an interface, so presumably we can provide an SQL-based implementation instead. We don't need to have this implementation as part of this PR, in order to keep it at a manageable size. But what this PR should do is build the public interface in such a way that it allows specifying backing stores other than FileBackingStore, potentially with different types of configuration parameters.

Since we need to support passing configuration via env variables, I suggest something similar to how we pass configuration for Keppel drivers. This is a full example, and this is how we parse these, but basically we could have something like this:

export ${PREFIX}_BACKING_STORE='{"type":"fs","params":{"path":"/var/cache/audit","max_total_size":1073741824}}'

The difference between opts.BackingStoreFactories vs. opts.BackingStore is similar to opts.EnvPrefix vs. opts.ConnectionURL: One allows using the default logic of collecting everything from env vars, one allows the application precise control over where to collect config from.

To allow both the application as well as this library to provide BackingStore implementations, I suggest modeling AuditorOpts like this:

type AuditorOpts struct {
  // Optional. If given, this BackingStore instance will be used directly.
  // If EnvPrefix is given, this will be initialized by reading a JSON payload in the form `{"type":"<type>","params":{...}}`
  // from the environment variable "${PREFIX}_BACKING_STORE".
  BackingStore BackingStore

  // Optional. If given, and the environment contains JSON configuration as described above,
  // a BackingStore constructor will be selected from this set based on the configured type.
  BackingStoreFactories map[string]BackingStoreFactory
}

type BackingStoreFactory func(params json.RawMessage, opts AuditorOpts) (BackingStore, error)

func NewFileBackingStore(params json.RawMessage, opts AuditorOpts) (BackingStore, error) {
  var bsOpts struct {
    Directory string `json:"path"`
    MaxFileSize int64 `json:"max_file_size"`
    MaxTotalSize int64 `json:"max_total_size"`
  }
  err := json.Unmarshal([]byte(params), &bsOpts)
  if err != nil {
    return nil, fmt.Errorf("while unmarshaling params for FileBackingStore: %w", err)
  }
  registry := opts.Registry
  //... continue with existing implementation...
}

Then this could be used as:

auditor := must.Return(audittools.NewAuditor(ctx, audittools.AuditorOpts{
  EnvPrefix: "LIMES_AUDIT_RABBITMQ",
  BackingStoreFactories: map[string]audittools.BackingStoreFactory{
    "fs": audittools.NewFileBackingStore,
    "db": func(params json.RawMessage, opts audittools.AuditorOpts) {
      return newDBBackingStore(dbConnection, params, opts.Registry)
    },
  },
})

What do you think?

@notque
Copy link
Contributor Author

notque commented Nov 21, 2025

What do you think?

You bring up very reasonable points, and ones I thought you'd bring up. I'm not particularly happy with adding volumes to every service, and agree that supporting additional options like a database is reasonable.

I think you understand the situation, I don't particularly think this is a great achievement, but it's a requirement I cannot avoid, and thus I am trying to bring about a solution that involves as little pain as possible.

I did accidently submit this as as pr ready to review, and then set it to work in progress. Apologies you're receiving notifications from it. I am not at a stage yet where it is ready to review. I will end up pushing up a lot of changes again tonight, and then also trying to incorporate your suggestions with the state I have it in currently.

I do very much appreciate your stating clearly what you want here, as I'm happy to do whatever I can to not make this terrible for you.

This comment was marked as outdated.

@notque
Copy link
Contributor Author

notque commented Nov 21, 2025

@majewsky i've made adjustments based on your comments. Is this remotely reasonable for you. I clearly have more work to do going through feedback from copilot, but in broad strokes is this okay?

Comment on lines +76 to +84
if s.TableName == "" {
s.TableName = "audit_events"
}
if s.BatchSize == 0 {
s.BatchSize = 100
}
if s.MaxEvents == 0 {
s.MaxEvents = 10000
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For optional values with defaults, consider declaring the field as Option[] and then using it as s.TableName.UnwrapOr("audit_events") etc.

Comment on lines +277 to +289
func getTestDSN(t *testing.T) string {
t.Helper()

// Check for test database environment variable
dsn := os.Getenv("AUDITTOOLS_TEST_DB_DSN")
if dsn == "" {
t.Skip("AUDITTOOLS_TEST_DB_DSN not set, skipping SQL backing store tests. " +
"Set to a PostgreSQL connection string to run these tests, e.g.: " +
"postgres://user:password@localhost:5432/testdb?sslmode=disable")
}

return dsn
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use WithTestDB and ConnectForTest from go-bits/easypg, to allow those tests to run in the go-makefile-maker CI pipeline.

@notque notque force-pushed the audit_backing_store branch from bdd7e07 to 10667c9 Compare November 24, 2025 22:08
@sapcc sapcc deleted a comment from Copilot AI Nov 24, 2025
@sapcc sapcc deleted a comment from Copilot AI Nov 24, 2025
@sapcc sapcc deleted a comment from Copilot AI Nov 24, 2025
@sapcc sapcc deleted a comment from Copilot AI Nov 24, 2025
@notque notque force-pushed the audit_backing_store branch from 10667c9 to 231f191 Compare November 24, 2025 22:13
@notque notque force-pushed the audit_backing_store branch from 231f191 to f6d00c3 Compare November 24, 2025 22:23
@notque notque marked this pull request as ready for review December 5, 2025 18:42
Copy link
Contributor

@majewsky majewsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was writing tons of comments and then I had to reload the page because the GitHub UI is bugged, and I lost over a dozen comments. I need to calm down first, and then I will get back to this.... I don't know, after the Christmas holiday I guess?

return rabbitURL, queueName, nil
}

func (opts AuditorOpts) parsePort() (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a separate function? Code should only be in its own function if it's reused, or if it's a long-winded implementation of details that distract from the main purpose of the overall function. Both do not apply here as far as I can see, so this feels like a contrived edit to satisfy silly metrics like cyclomatic complexity.

Comment on lines +22 to +29
if err == nil {
t.Fatal("expected error for invalid backing store config, got nil")
}

expectedMsg := "unknown backing store type"
if !strings.Contains(err.Error(), expectedMsg) {
t.Fatalf("expected error containing %q, got: %v", expectedMsg, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err == nil {
t.Fatal("expected error for invalid backing store config, got nil")
}
expectedMsg := "unknown backing store type"
if !strings.Contains(err.Error(), expectedMsg) {
t.Fatalf("expected error containing %q, got: %v", expectedMsg, err)
}
assert.ErrEqual(t, err, regexp.MustCompile("unknown backing store type"))

Comment on lines +42 to +44
if err != nil {
t.Fatalf("expected no error, got: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
t.Fatalf("expected no error, got: %v", err)
}
assert.ErrEqual(t, err, nil)

Other tests might also benefit from using assert.ErrEqual, but I will not flag it again to stay concise.

Comment on lines +384 to +386
if mf.GetMetric()[0].GetCounter().GetValue() != 3 {
t.Errorf("expected 3 writes, got %f", mf.GetMetric()[0].GetCounter().GetValue())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use assert.Equal(). Same below for the other gauge.

Copy link
Contributor

@majewsky majewsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub destroyed about an hour of my work by discarding my comments, and I cannot get them all back because I need to leave, but these are the big structural remarks. I did not get to reading trail.go.

"with the given prometheus registry" is vague

Co-authored-by: Stefan Majewsky <stefan.majewsky@sap.com>
@notque
Copy link
Contributor Author

notque commented Dec 17, 2025

So I was writing tons of comments and then I had to reload the page because the GitHub UI is bugged, and I lost over a dozen comments. I need to calm down first, and then I will get back to this.... I don't know, after the Christmas holiday I guess?

I am terribly sorry. This isn't required to January and I appreciate the effort and time.

@notque notque requested a review from a team as a code owner March 4, 2026 14:50
@notque notque marked this pull request as draft March 4, 2026 14:53
notque and others added 19 commits March 4, 2026 07:55
Co-authored-by: Stefan Majewsky <stefan.majewsky@sap.com>
Co-authored-by: Stefan Majewsky <stefan.majewsky@sap.com>
Co-authored-by: Stefan Majewsky <stefan.majewsky@sap.com>
Co-authored-by: Stefan Majewsky <stefan.majewsky@sap.com>
…t, fix connection leak and corrupted event reprocessing
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/sapcc/go-bits/audittools 58.22% (+58.22%) 🌟

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/sapcc/go-bits/audittools/auditor.go 48.51% (+48.51%) 1515 (+450) 735 (+735) 780 (-285) 🌟
github.com/sapcc/go-bits/audittools/backing_store.go 0.00% (ø) 0 0 0
github.com/sapcc/go-bits/audittools/backing_store_file.go 68.05% (+68.05%) 2535 (+2535) 1725 (+1725) 810 (+810) 🌟
github.com/sapcc/go-bits/audittools/backing_store_memory.go 93.02% (+93.02%) 645 (+645) 600 (+600) 45 (+45) 🌟
github.com/sapcc/go-bits/audittools/backing_store_sql.go 75.26% (+75.26%) 1455 (+1455) 1095 (+1095) 360 (+360) 🌟
github.com/sapcc/go-bits/audittools/rabbitmq.go 13.64% (+13.64%) 330 (+15) 45 (+45) 285 (-30) 🎉
github.com/sapcc/go-bits/audittools/trail.go 20.90% (+20.90%) 1005 (+435) 210 (+210) 795 (+225) 🌟

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/sapcc/go-bits/audittools/auditor_test.go
  • github.com/sapcc/go-bits/audittools/backing_store_sql_test.go
  • github.com/sapcc/go-bits/audittools/backing_store_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants