Skip to content

Commit b2fd783

Browse files
author
Leon Eller
committed
Added a TODO and README comment in relation to Stream[_] and context-per-jvm=true
1 parent 08596e0 commit b2fd783

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -621,7 +621,10 @@ serialized properly:
621621
- Anything that implements Product (Option, case classes) -- they will be serialized as lists
622622
- Maps and Seqs may contain nested values of any of the above
623623
- If a job result is of scala's Stream[Byte] type it will be serialised directly as a chunk encoded stream.
624-
This is useful if your job result payload is large and may cause a timeout serialising as objects.
624+
This is useful if your job result payload is large and may cause a timeout serialising as objects. Beware, this
625+
will not currently work as desired with context-per-jvm=true configuration, since it would require serialising
626+
Stream[_] blob between processes. For now use Stream[_] job results in context-per-jvm=false configuration, pending
627+
potential future enhancements to support this in context-per-jvm=true mode.
625628
626629
If we encounter a data type that is not supported, then the entire result will be serialized to a string.
627630

job-server/src/spark.jobserver/JobManagerActor.scala

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,15 @@ class JobManagerActor(contextConfig: Config) extends InstrumentedActor {
285285
}(executionContext).andThen {
286286
case Success(result: Any) =>
287287
statusActor ! JobFinished(jobId, DateTime.now())
288+
// TODO: If the result is Stream[_] and this is running with context-per-jvm=true configuration
289+
// serializing a Stream[_] blob across process boundaries is not desirable.
290+
// In that scenario an enhancement is required here to chunk stream results back.
291+
// Something like ChunkedJobResultStart, ChunkJobResultMessage, and ChunkJobResultEnd messages
292+
// might be a better way to send results back and then on the other side use chunked encoding
293+
// transfer to send the chunks back. Alternatively the stream could be persisted here to HDFS
294+
// and the streamed out of InputStream on the other side.
295+
// Either way an enhancement would be required here to make Stream[_] responses work
296+
// with context-per-jvm=true configuration
288297
resultActor ! JobResult(jobId, result)
289298
case Failure(error: Throwable) =>
290299
// If and only if job validation fails, JobErroredOut message is dropped silently in JobStatusActor.

0 commit comments

Comments
 (0)