release-24.3: backupccl: remove ldr job id references on restoring tables #10

ebembi-crdb · 2024-11-09T13:40:44Z

Backport 1/1 commits from cockroachdb#134655 on behalf of @msbutler.

/cc @cockroachdb/release

This patch ensures that the user can run schema changes on a restored a table that was previously apart of an LDR stream.

Fixes cockroachdb#134424

Release note(bug fix): this patch allows a user to run schema changes on a restored table that was previously apart of an LDR stream.

Release justification:

Create test.txt

Create random.txt

Create testr.txt

Create abc.txt

Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Now, users can leverage the `bufferedSink` in `file-defaults` and `file-groups by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.

As we introduce the ability to enable `buffering` on file-based log sinks as experimental, there's a need for a new `BufferFormat`. Previously, the default format was to introduce a newline between each log message flushed to the underlying log sink as a delimiter. However, when flushing to files, this is not desireable behavior as it causes the log file to look like: ``` I240313 16:38:33.355448 113 1@gossip/gossip.go:1374 ⋮ [T1,Vsystem,n1] 80 node has connected to cluster via gossip I240313 16:38:33.360943 113 kv/kvserver/stores.go:282 ⋮ [T1,Vsystem,n1] 81 wrote 0 node addresses to persistent storage ``` Empty lines do not play well with log parsers, so we introduce a new `BufferFormat`, `BufferFmtNone`, which uses no delimiters, prefixes, or suffixes. With this new option, log files no longer see these empty lines. The validation code for file log sinks forces the `BufferFmtNone` into the buffering configuration, so this is not something that needs to explicitly be configured by the user. Release note: none

Ebembi crdb patch 4

Revert "Ebembi crdb patch 4"

Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Now, users can leverage the `bufferedSink` in `file-defaults` and `file-groups by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.

As we introduce the ability to enable `buffering` on file-based log sinks as experimental, there's a need for a new `BufferFormat`. Previously, the default format was to introduce a newline between each log message flushed to the underlying log sink as a delimiter. However, when flushing to files, this is not desireable behavior as it causes the log file to look like: ``` I240313 16:38:33.355448 113 1@gossip/gossip.go:1374 ⋮ [T1,Vsystem,n1] 80 node has connected to cluster via gossip I240313 16:38:33.360943 113 kv/kvserver/stores.go:282 ⋮ [T1,Vsystem,n1] 81 wrote 0 node addresses to persistent storage ``` Empty lines do not play well with log parsers, so we introduce a new `BufferFormat`, `BufferFmtNone`, which uses no delimiters, prefixes, or suffixes. With this new option, log files no longer see these empty lines. The validation code for file log sinks forces the `BufferFmtNone` into the buffering configuration, so this is not something that needs to explicitly be configured by the user. Release note: none

pkg/util/log: enable bufferedSink usage with fileSink as experimental

This patch ensures that the user can run schema changes on a restored a table that was previously apart of an LDR stream. Fixes cockroachdb#134424 Release note(bug fix): this patch allows a user to run schema changes on a restored table that was previously apart of an LDR stream.

ebembi-crdb added 17 commits September 16, 2024 16:46

Create test.txt

0070d61

Merge pull request #1 from ebembi-crdb/testing

b6bb15e

Create test.txt

Create random.txt

13bef39

Merge pull request #2 from ebembi-crdb/ebembi-crdb-patch-1

9eb87fd

Create random.txt

Create testr.txt

6ad5712

Merge pull request #3 from ebembi-crdb/ebembi-crdb-patch-2

4f89a5c

Create testr.txt

Create abc.txt

c1b6a89

Merge pull request #4 from ebembi-crdb/ebembi-crdb-patch-3

feb4f6a

Create abc.txt

Merge pull request #5 from ebembi-crdb/ebembi-crdb-patch-4

84814af

Ebembi crdb patch 4

Revert "Ebembi crdb patch 4"

8335b8b

Merge pull request #6 from ebembi-crdb/revert-5-ebembi-crdb-patch-4

8478ef4

Revert "Ebembi crdb patch 4"

Merge pull request #7 from ebembi-crdb/testing-rel-note

432ba6b

pkg/util/log: enable bufferedSink usage with fileSink as experimental

ebembi-crdb force-pushed the master branch from 20b3ee1 to 8ba7c5b Compare February 27, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-24.3: backupccl: remove ldr job id references on restoring tables #10

release-24.3: backupccl: remove ldr job id references on restoring tables #10

Uh oh!

ebembi-crdb commented Nov 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

release-24.3: backupccl: remove ldr job id references on restoring tables #10

Are you sure you want to change the base?

release-24.3: backupccl: remove ldr job id references on restoring tables #10

Uh oh!

Conversation

ebembi-crdb commented Nov 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants