Skip to content

[FEATURE] PPL UDF and UDAF unification in Spark #1281

@dai-chen

Description

@dai-chen

Is your feature request related to a problem?

Unifying function behavior removes a major source of drift between engines and lets the PPL frontend stay portable. Today, many PPL UDFs/UDAFs have their definitions and implementations tightly coupled to Calcite APIs. This leads to duplicated engine-specific function code paths, and friction when porting these PPL functions to other engine such as Spark.

What solution would you like?

Proposed approach:

  1. unified-query-api module provides engine-independent UnifiedFunction and UnifiedFunctionRepository abstraction.
  2. Spark integrates unified function into Catalyst expression system by a thin UnifiedFunctionSparkWrapper
Image

What alternatives have you considered?

[Update after PoC]

Do you have any additional context?

N/A

Sub-issues

Metadata

Metadata

Assignees

Labels

MetaMeta issue, not directly linked to a PRenhancementNew feature or request

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions