You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: develop-docs/sdk/telemetry/spans/batch-processor.mdx
+42-39Lines changed: 42 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,11 +105,11 @@ Each SDK environment is unique. Therefore, SDKs have three options to choose fro
105
105
106
106
When the SDK detects a sudden process termination, it MUST put all remaining items in the BatchProcessor into one envelope and flush it. If your SDK has an offline cache, it MAY flush the envelope to disk and skip sending it to Sentry, if it ensures to send the envelope the next time the SDK starts. The BatchProcessor MUST keep its existing logic described in the [specification](#specification) above.
107
107
108
-
Suppose your SDK can't reliably detect sudden process terminations, or it can't reliably flush envelopes to Sentry or disk when a sudden process termination happens. In that case, it SHOULD implement the [FileStream Cache](#2-file-stream-cache) or the [FIFO Queue with Async FileStream Cache](#3-fifo-queue-with-async-file-stream-cache). It's acceptable to start with this option as a best effort interim solution before adding one of the more complex options.
108
+
Suppose your SDK can't reliably detect sudden process terminations, or it can't reliably flush envelopes to Sentry or disk when a sudden process termination happens. In that case, it SHOULD implement the [FileStream Cache](#2-file-stream-cache) or the [DoubleRotatingBuffer](#3-doublerotatingbuffer). It's acceptable to start with this option as a best effort interim solution before adding one of the more complex options.
109
109
110
110
### 2. FileStream Cache
111
111
112
-
SDKs for which blocking the main thread is a nogo, such as Android and Apple, SDKs MUST NOT implement this option. They SHOULD implement the [FIFO Queue with Async FileStream Cache](#3-fifo-queue-with-async-file-stream-cache).
112
+
SDKs for which blocking the main thread is a nogo, such as Android and Apple, SDKs MUST NOT implement this option. They SHOULD implement the [DoubleRotatingBuffer](#3-doublerotatingbuffer).
113
113
114
114
With this option, the BatchProcessor stores the data on the calling thread directly to disk. The SDK SHOULD store the BatchProcessor files in a folder that is a sibling of the `envelopes` or `replay` folder, named `batch-processor`. This folder is scoped per DSN, so SDKs ensure not mixing up data for different DSNs. In the `batch-processor` folder, the SDK MUST store two types of cache files:
115
115
@@ -119,68 +119,71 @@ With this option, the BatchProcessor stores the data on the calling thread direc
119
119
When the timeout expires or the cache file hits the size limit, the BatchProcessor renames the `cache` file to `flushing`, creates a new `cache` file for incoming data, converts the data in the `flushing` file to an envelope, sends it to Sentry, and then deletes the `flushing` file. When the SDK starts again, it MUST check if there are any cache files in the cache directory (both `cache` and `flushing`) and if so, it MUST load the data from the files and send it to Sentry.
120
120
121
121
122
-
### 3. FIFO Queue with Async FileStream Cache
122
+
### 3. DoubleRotatingBuffer
123
123
124
124
SDKs should only consider implementing this option when options 1 or 2 are insufficient to prevent data loss within their ecosystem. We recommend this option only if SDKs are unable to reliably detect sudden process terminations or to consistently store envelopes to disk during such terminations, as can occur with Android or Apple devices.
125
125
126
-
With this option, the BatchProcessor first stores its data in a thread-safe FIFO queue, residing in an async-safe memory space, allowing the crash reporter to write to disk when a crash occurs. Furthermore, the BatchProcessor stores data asynchronously into a file, allowing it to recover after an abnormal termination, for which the crash handler can't run.
126
+
The BatchProcessor uses two buffers to minimize data loss in the event of an abnormal process termination:
127
+
***Crash-safe list**: A list stored in a crash-safe space to prevent data loss during detectable abnormal process terminations.
128
+
***Async IO cache**: When a process terminates abruptly, the crash-safe list loses all its elements. Therefore, the BatchProcessor uses a second buffer, the async IO cache, that stores elements to disk on a background thread to avoid blocking the calling thread.
127
129
128
-
The BatchProcessor maintains its logic of batching multiple logs and spans together into a single envelope to avoid multiple HTTP requests.
130
+
Furthermore, the BatchProcessor MUST prevent data loss when flushing. Therefore, it uses a double-buffering solution, meaning the two buffers alternate. The crash-safe list has two lists, and the async IO buffer has two files. When list1 is full, the BatchProcessor stores items in list2 until it successfully stores items in list1 to disk as an envelope. Then it can delete items in list1. The same applies to the IO buffer.
129
131
130
-
Hybrid SDKs pass every log and span down to the native SDKs, which will put every log and span in their BatchProcessor and its cache when logs and spans are ready for sending, meaning after they go through beforeLog, integrations, processors, etc.
132
+
When the SDK detects an abnormal process termination, it stores items in the crash-safe list to disk. The next time the SDK starts, it sends these items. When an undetectable process termination occurs, the SDK loses items from the crash-safe list that have not yet been stored in the async IO buffer, which we intentionally accept rather than blocking the calling thread. Furthermore, the SDK must deduplicate items in the stored crash-safe list and the IO buffer by item ID to avoid sending duplicates.
131
133
132
-
#### Receiving Data
134
+
#### BatchProcessor Files
133
135
134
-
When the BatchProcessor receives data, it performs the following steps
136
+
The SDK SHOULD store the BatchProcessor files in a folder that is a sibling of the `envelopes` or `replay` folder, named `batch-processor`. This folder is scoped per DSN, so SDKs ensure not mixing up data for different DSNs. The `batch-processor` folder SHOULD contain the following files:
135
137
136
-
1. Put the item into the FIFO queue on the calling thread.
137
-
2. On a background thread, serialize the next item of the FIFO queue and store it in the `cache` file.
138
-
3. Remove the log from the FIFO queue.
139
-
4. If the queue isn't empty, go back to step 2.
138
+
-`file-buffer1` and `file-buffer2` - The active IO buffers for the BatchProcessor.
139
+
-`detected-termination-x` - The file containing items from a previous detected abnormal termination.
140
+
-`envelope-to-flush-x` - The envelope that the BatchProcessor is about to move to the envelopes cache folder, so the SDK can send it to Sentry.
140
141
141
-
The FIFO queue has a `max-item-count` of 64. When the FIFO queue exceeds `max-item-count`, the BatchProcessor MUST drop items and record client reports with the category `queue_overflow` for every dropped item. SDKs MAY choose a different `max-item-count` value, if needed. SDKs MUST NOT expose the `max-item-count` value to users as an option. We can make this option public in the future, if required.
142
142
143
-
#### Abnormal Process Termination
143
+
#### Receiving Items
144
144
145
-
When SDKs detect an abnormal process termination, they MUST write the items in the FIFO queue to the `abnormal-termination-x` file where `x` is the an increasing index of the file starting from 0.
145
+
The BatchProcessor has two lists `crash-safe-list1` and `crash-safe-list2` and two files `file-buffer1` and `file-buffer2`. When it receives items, it performs the following steps
146
146
147
-
When the process terminates abnormally and the SDKs can't detect it, the SDKs lose items in the FIFO queue, which we accept over blocking the calling thread that could be the main thread.
147
+
1. Put the item into the crash-safe `crash-safe-list1` on the calling thread.
148
+
2. On a background thread, store the item in the `file-buffer1`.
148
149
149
-
No matter the abnormal process termination, SDKs MUST send items both in the `abnormal-termination-x` and `cache` files on the next SDK start. Please refer to the [SDK start](#sdk-start) section for the detailed specification.
150
+
#### Flushing
150
151
151
-
#### Cache File Location
152
+
When the `crash-safe-list1` exceeds the [above described](#specification) 1MiB in size, the BatchProcessor performs the following flushing steps:
152
153
153
-
The SDK SHOULD store the BatchProcessor cache files in a folder that is a sibling of the `envelopes` or `replay` folder, named `batch-processor`. This folder is scoped per DSN, so SDKs ensure not mixing up data for different DSNs. The `batch-processor` folder MAY contain the follow files, where `x` is the an increasing index of the file starting from 0:
154
+
1. Store new incoming items in `crash-safe-list2` and `file-buffer2`.
155
+
2. Put the items of `crash-safe-list1` into an envelope named `envelope-to-flush-x`, where `x` is the an increasing index of the file starting from 0, in the same folder as the BatchProcessor files.
156
+
3. Delete the items in `crash-safe-list1` and `file-buffer1`.
157
+
4. Move the envelope to the envelopes cache folder.
154
158
155
-
-`cache` - The active writing cache file for the BatchProcessor.
156
-
-`cache-to-flush-x` - The cache file that the BatchProcessor is about to convert to an envelope.
157
-
-`abnormal-termination-fifo-queue-x` - The file containing the data from the FIFO queue when an abnormal process termination occurs.
158
-
-`abnormal-termination-cache-x` - The cache file containing items from a previous abnormal termination.
159
-
-`envelope-to-flush-x` - The envelope that the BatchProcessor is about to move to the envelopes cache folder, so the SDK can send it to Sentry.
159
+
The BatchProcessor stores the envelope-to-flush not directly in the envelope cache folder because, if an abnormal process termination occurs before deleting the items `crash-safe-list1` and `file-buffer1`, the SDKs would send duplicate items.
160
160
161
-
#### Flushing
162
161
163
-
The BatchProcessor MUST keep two cache files. When the BatchProcessor sends the items from `cache`, it renames it to `cache-to-flush-x` and creates a new `cache` to avoid losing items if an abnormal termination occurs when flushing the items. To avoid sending duplicate items if an abnormal termination occurs between storing the envelope and deleting the cache file, the BatchProcessor MUST first store the envelope to the same folder as the BatchProcessor files. After deleting the `cache-to-flush-x`, SDKs MUST move the envelope to the envelope cache folder. As moving files is usually atomic, SDKs avoid sending duplicated items in the described scenario. These are the flushing steps:
162
+
#### Abnormal Process Termination
163
+
164
+
When SDKs detect an abnormal process termination, they MUST write the items in both `crash-safe-list1` and `crash-safe-list2` to the `detected-abnormal-termination-x` file where `x` is the an increasing index of the file starting from 0.
164
165
165
-
1. Rename `cache` to `cache-to-flush-x` where `x` is the an increasing index of the cache file starting from 0.
166
-
2. Create a new `cache` file and store new items to this file.
167
-
3. Move the items in `cache-to-flush-x` to an envelope named `envelope-to-flush-x`. SDKs should utilize a file stream or a similar mechanism when moving the items to prevent loading all items into memory and causing a short-term memory spike. SDKs MUST not put the envelope into the envelope cache folder yet. Instead, they MUST store the envelope to the same folder as the other BatchProcessor files.
168
-
4. Delete the file `cache-to-flush-x`.
169
-
5. Move the `envelope-to-flush-x` to the envelopes cache folder.
166
+
When the process terminates abnormally and the SDKs can't detect it, the SDKs lose items in the crash safe lists, which we accept over blocking the calling thread that could be the main thread.
170
167
171
-
#### SDK start
168
+
#### SDK Initialization
172
169
173
-
Whenever the SDK starts, it must check if there is any data in the batch processor folder that needs to be recovered. SDKs MUST perform the following steps when starting:
170
+
Whenever the SDKs initialize, they must check if there is any data in the batch processor folder that needs to be recovered. SDKs MUST perform the following steps when initializing:
174
171
175
-
1. If there are items in the `cache`file, rename the `cache`file to `abnormal-termination-cache-x`.
176
-
2. Create a new `cache`file and store new items to this file.
177
-
3.If there is an `abnormal-termination-fifo-queue-x` file, deduplicate data from both the `abnormal-termination-cache-x` and `abnormal-termination-fifo-queue-x`file based on the IDs of the items.
172
+
1. If there are items in the `file-buffer1` or `file-buffer2` file, store all items into a file named `undetected-termination-x`.
173
+
2. Create new `file-buffer1` and `file-buffer2` files and store new items to this file.
174
+
3.Load all items from the `undetected-termination-x` and `detected-termination-x`and deduplicate them based on the IDs of the items.
178
175
4. Put the deduplicated items into the `envelope-to-flush-x` in the batch processor cache folder.
179
-
5. Delete the `abnormal-termination-cache-x` and `abnormal-termination-fifo-queue-x` files.
176
+
5. Delete the `undetected-termination-x` and `detected-termination-x` files.
180
177
6. Move the `envelope-to-flush-x` to the envelopes cache folder.
181
178
182
-
As abnormal terminations can occur at any time, there may be multiple `abnormal-termination-cache-x`, `abnormal-termination-fifo-queue-x`or `envelope-to-flush-x`files. SDKs MUST handle multiple file pairs at each of the above-described steps. For example, if there are two pairs of `abnormal-termination-cache-x` and `abnormal-termination-fifo-queue-x`, the SDKs should perform steps 3 to 6 for both pairs.
179
+
As abnormal terminations can occur at any time, there may be multiple `undetected-termination-x` and `detected-termination-x` files. SDKs MUST handle multiple file pairs at each of the above-described steps. For example, if there are two pairs of `undetected-termination-x` and `detected-termination-x`, the SDKs should perform steps 3 to 6 for both pairs.
183
180
184
181
#### SDK Closes
185
182
186
-
Whenever the users closes the SDK, the BatchProcessor MUST perform the steps described in the [Flushing](#flushing) section.
183
+
Whenever the users closes the SDK or the application terminates normally, the BatchProcessor MUST perform the steps described in the [Flushing](#flushing) section and the SDK MUST delete all items in the `file-buffer1` and `file-buffer2` files.
184
+
185
+
#### Miscellaneous
186
+
187
+
The BatchProcessor maintains its logic of batching multiple logs and spans together into a single envelope to avoid multiple HTTP requests.
188
+
189
+
Hybrid SDKs pass every log and span down to the native SDKs, which will put every log and span in their BatchProcessor and its cache when logs and spans are ready for sending, meaning after they go through beforeLog, integrations, processors, etc.
0 commit comments