Streaming a lot of data fails after completing #12831

krivulcik · 2021-10-11T13:32:43Z

krivulcik
Oct 11, 2021

We sometimes need to process a lot of data from the database. This is usually done by streaming the index results.

Sometimes, when the processing takes a long time, we first load the IDs of the relevant entities and then process the entities in batches (a new session for each batch). This is to prevent the networking buffers from overflowing and the connection from getting closed (https://ayende.com/blog/170401/timeouts-tcp-and-streaming-operations).

Sometimes, the processing is reasonably fast, and directly processing the data is feasible. However, recently, we have started receiving errors when the processing is finished.

From application logs, we can see that all the records from the database were received and processed. Then, the program fails with the following:

System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStream.g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.FillAsync()
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken)
   at System.IO.Compression.DeflateStream.FinishReadAsyncMemory(ValueTask`1 readTask, Memory`1 buffer, CancellationToken cancellationToken)
   at Raven.Client.Util.AsyncHelpers.RunSync[T](Func`1 task) in C:\Builds\RavenDB-Stable-5.2\52012\src\Raven.Client\Util\AsyncHelpers.cs:line 135
   at System.IO.Stream.Read(Span`1 buffer)
   at Sparrow.Json.Parsing.UnmanagedJsonParserHelper.ReadObject(BlittableJsonDocumentBuilder builder, PeepingTomStream peepingTomStream, UnmanagedJsonParser parser, MemoryBuffer buffer) in C:\Builds\RavenDB-Stable-5.2\52012\src\Sparrow\Json\Parsing\UnmanagedJsonParserHelper.cs:line 95
   at Raven.Client.Documents.Session.Operations.StreamOperation.YieldStreamResults.MoveNext() in C:\Builds\RavenDB-Stable-5.2\52012\src\Raven.Client\Documents\Session\Operations\StreamOperation.cs:line 370
   at Raven.Client.Documents.Session.DocumentSession.YieldResults[T](IDocumentQuery`1 query, IEnumerator`1 enumerator)+MoveNext() in C:\Builds\RavenDB-Stable-5.2\52012\src\Raven.Client\Documents\Session\DocumentSession.Stream.cs:line 73
   at [my code]

Also, the RavenDB servers are very, very slow immediately before such failure. After the program fails, the database servers recover on their own.

The most recent occurrence was when the program was only collecting the entity IDs, but we forgot to project the results. Because of that omission, the entire index content was sent to the client program. The processing was none in this case.

We are talking about processing several millions of records (2-4 million), which would mean several gigabytes of data, I think. The processing (network transfer basically, in this case) took around 25 minutes in this case.

I'm not 100% sure, but this seems to have started happening after our upgrade from 5.1.5 to 5.2.3. We have also added and changed some functionality in the relevant processing code, so that might not be related.

Is there some server-side cleanup task, which would explain why the server gets so slow, and the connection gets aborted?

Since streaming is meant for this kind of use, I don't think that warning against such use can be reasonably implemented. Warnings like those about huge documents and long queries are not very relevant here, right?

Any thoughts about how to investigate this would be appreciated.

ayende · 2021-10-11T14:03:59Z

ayende
Oct 11, 2021
Maintainer

Are you using encrypted database?
Is this something that you can reproduce?

Unless we are talking about the network overflow, that should work. This is the exact scenario that streaming is for.

I assume you ruled out an actual network hiccup? 25+ minutes is a pretty long time that may have caused it.

3 replies

krivulcik Oct 11, 2021
Author

The database is not encrypted.

This happens pretty reliably with our production data and our code. It's not intermittent.

After finishing the stream, the program fails when the stream is about to get closed (or maybe when trying to process the last few records?).

The network is almost certainly not the cause here, it occurred multiple times already, and everything is in Azure VMs (same region).

Some of our processing programs run for several hours. Before we optimized them, some ran in a single stream for hours. We didn't notice this kind of failure before.

ayende Oct 12, 2021
Maintainer

Is it possible that you have some post processing on the stream that takes a while from processing the last document to closing things?

Can you run this with tcpdump or such?

krivulcik Oct 12, 2021
Author

I'll try to investigate/reproduce, but next week at the earliest, I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming a lot of data fails after completing #12831

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Streaming a lot of data fails after completing #12831

Uh oh!

krivulcik Oct 11, 2021

Replies: 1 comment · 3 replies

Uh oh!

ayende Oct 11, 2021 Maintainer

Uh oh!

krivulcik Oct 11, 2021 Author

Uh oh!

ayende Oct 12, 2021 Maintainer

Uh oh!

krivulcik Oct 12, 2021 Author

krivulcik
Oct 11, 2021

Replies: 1 comment 3 replies

ayende
Oct 11, 2021
Maintainer

krivulcik Oct 11, 2021
Author

ayende Oct 12, 2021
Maintainer

krivulcik Oct 12, 2021
Author