Skip to content

Implement Resource Allocator for Operator-level Parallelism (Ramen) #3605

@yunyad

Description

@yunyad

Many data workflows in Texera can take hours or even days to complete. Within these long-running workflows, execution time can be significantly reduced by adjusting the degree of parallelism (i.e., the number of workers) for each operator. Manually tuning each operator’s parallelism is time-consuming and places a heavy burden on users, especially those without a programming background.

To address this challenge, we propose integrating an auto-tuning resource allocator into Texera. This feature is based on Ramen (Resource Allocator for Operator-level Parallelism), a paper currently under review at VLDB. Ramen automatically optimizes the degree of parallelism for each operator to reduce total execution time.

As an initial step, we will implement the Greedy (DFS-based) algorithm described in the Ramen paper as a baseline allocator. Also, we are planning for further extensions, including learning-based methods (e.g., reinforcement learning), which are also discussed in the Ramen paper.


We will design and implement the Ramen allocator from the following aspects:

1. UI Updates

Current design:

  • Currently, only UDF (User Defined Function) operators allow users to specify the number of workers explicitly.
  • There is no explicit field or metadata to indicate whether an operator is parallelizable.
  • The system currently infers parallelizability based on the number of assigned workers:
    • If an operator is assigned more than one worker, it is assumed to be parallelizable.
    • If an operator is assigned exactly one worker, it is assumed to be non-parallelizable.
      Example (PythonUDF and Filter operator):
Image Image

New design:

  • Users can indicate whether an operator is parallelizable or not for UDFs.
  • For non-UDF operators, users can see whether an operator is parallelizable or not.
  • In the advanced setting, users can still input their suggested number of workers for each operator.
    Example (PythonUDF and Filter operator):
Image

2. Input Requirements for Ramen

  • Identify the necessary input data for running the allocator, including:
    • Operator-level runtime statistics
    • System constraints
    • User inputs or hints

3. Result Lifecycle and Storage

  • The parallelism configurations generated by Ramen will be stored as part of the runtime statistics. Specifically, the number of workers (worker_num) will be stored in Iceberg, alongside other runtime statistics.
  • The lifecycle of these parallelism configurations will follow the same lifecycle as the associated runtime statistics. That is, they will be persisted and deleted together with the corresponding execution metadata.
  • Runtime statistics are not automatically removed. Users can manually delete runtime statistics via the interface or API, and doing so will also remove the corresponding parallelism configurations.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions