Adding customChunks to future.apply functions

# Chunking in `future.apply`

`future.apply` currently relies on the internal [makeChunks.R](https://github.com/HenrikBengtsson/future.apply/blob/b49eeba4d56acdfa40ef84bf8f7552d2ce1317c3/R/makeChunks.R) function to partition elements for processing into "chunks" that are sent to workers for processing. `makeChunks` outputs a list of integer vectors, where each vector is a "chunk" and its elements are the indices representing the elements to be processed in the input object (often a list).

Users have some control over the generation of chunks via the `future.apply` arguments `future.chunk.size` (which specifies the average number of elements per chunk a user prefers) and `future.scheduling` (which specifies the order of chunk processing). Furthermore, they can control the processing order of chunks with the ordering attribute of `future.chunk.size` or `future.scheduling`.

Nevertheless, this control is limited and even the sensible defaults of `makeChunks` can produce substantial load imbalance across workers and resulting inefficiency. Some of this inefficiency could be reduced if users were able to better-control chunk generation. While some of this inefficiency may be averted by the dynamic balancing of `future.apply`. the costs of dynamic balancing itself can be non-trivial.

# The Purpose of `makeChunks`

Currently, `makeChunks` accomplishes two tasks:

1. Generates chunks by partitioning elements to be processed.
2. Specifies the order in which chunks are processed.

Ideally, the former is redundant as the elements of the object users pass `future.apply` would be the chunks they want processed and `nbrOfElements == nbrOfWorkers` and the latter is redundant as chunks are generated such that chunks are indexed in the order in which they should be processed. This allows for efficient static load balancing with chunks already balanced and one chunk per worker. However, users often pass objects where the ordering is ad-hoc and chunking not planned.

# Adding `customChunks`

I envision two approaches to improving the flexibility of chunking in `future.apply`:

1. Add a `customChunks` argument to `future.apply` functions

Users could pass a list to `customChunks`. future.apply` would use this list instead of the list that `makeChunks` returns. If `is.null(customChunks) == TRUE`, then the status quo internal `makeChunks` function is used. If `is.null(customChunks) == FALSE`,  `makeChunks` is not executed and all other chunk-related arguments are ignored. 

The primary motivation for this is that users may wish to (a) have complete control over chunking and ordering of chunks, (b) do so without modifying their input to ``future.apply` (i.e. avoid creating deeper objects or repeatedly rearranging the elements of their object just for processing) and (c) create more interpretable code distinguishing between the input object, the plan for processing, and processing itself. This also helps decouple functions for working in parallel from functions for serial pre-processing.

2. Add a `customChunks` argument to `future.apply` functions and export `makeChunks` 

Users could pass their object to `makeChunks` and pass the result to to the `customChunks` argument of `future.apply`. In the event that `customChunks == NULL`, `future.apply` would call `makeChunks` as usual. This would allow users to generate chunks with `makeChunks` either inside or outside of `future.apply`. The upside to this is that users can directly observe and edit the output of `makeChunks`.

I could submit a pull request implementing this, but I'm not sure when/how `makeChunks` is called internally - don't see it in makefiles or the definitions of the `future.apply` functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding customChunks to future.apply functions #83

Chunking in `future.apply`

The Purpose of `makeChunks`

Adding `customChunks`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding customChunks to future.apply functions #83

Description

Chunking in future.apply

The Purpose of makeChunks

Adding customChunks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Chunking in `future.apply`

The Purpose of `makeChunks`

Adding `customChunks`