Commit a84c23a
committed
[IMP] snippets: do not return unneeded data from mp worker
In `convert_html_columns()`, we select 100MiB worth of DB tuples and pass them
to a ProcessPoolExecutor together with a converter callable. So far, the
converter returns all tuples, changed or unchanged together with the
information if it has changed something. All this is returned through IPC to
the parent process. In the parent process, the caller only acts on the changed
tuples, though, the rest is ignored. In any scenario I've seen, only a small
proportion of the input tuples is actually changed, meaning that a large
proportion is returned through IPC unnecessarily.
What makes it worse is that processing of the converted results in the parent
process is often slower than the conversion, leading to two effects:
1) The results of all workers sit in the parent process's memory, possibly
leading to MemoryError (upg-2021031)
2) The parallel processing is being serialized on the feedback, defeating a
large part of the intended performance gains
This commit fixes (1) by only returning `None` for unchanged tuples. An
improvement for (2) follows in the next commit.1 parent 062d11a commit a84c23a
1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
270 | | - | |
| 270 | + | |
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
| |||
310 | 310 | | |
311 | 311 | | |
312 | 312 | | |
313 | | - | |
| 313 | + | |
314 | 314 | | |
315 | 315 | | |
316 | 316 | | |
| |||
0 commit comments