[SPARK-53802][SDP] Support string values for user-specified schema in SDP tables #52517

sryza · 2025-10-04T14:55:39Z

What changes were proposed in this pull request?

When defining a streaming table or materialized view, enable passing a string to its schema, in addition to a StructType. This mimics the flexibility of the DataFrameReader schema arg.

E.g.

from pyspark.sql.functions import lit

@dp.materialized_view(schema="id LONG, name STRING")
def table_with_string_schema():
    return spark.range(5).withColumn("name", lit("test"))

Why are the changes needed?

For flexibility and consistency with similar args.

Does this PR introduce any user-facing change?

Makes changes to unreleased protos.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

gengliangwang · 2025-10-07T06:20:21Z

...nnect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala

+    assert(graph.tables.size == 1)
+
+    val table = graph.table(graphIdentifier("table_with_string_schema"))
+    assert(table.specifiedSchema.isDefined)


nit: shall we simply compare table.specifiedSchema with an expected schema (for example, StructType.fromDDL(id LONG, name STRING))

gengliangwang · 2025-10-07T06:21:47Z

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto

-    optional spark.connect.DataType schema = 7;
+    oneof schema {
+      spark.connect.DataType schema_data_type = 7;
+      string schema_string = 10;


since Spark 4.1 is not offically released. I wonder if we can change the sequence numbers here.

sryza requested a review from HyukjinKwon October 4, 2025 14:55

github-actions bot added SQL PYTHON CONNECT labels Oct 4, 2025

sryza requested a review from gengliangwang October 4, 2025 14:55

table schema string

cb8ba44

sryza force-pushed the dataset-schema-string branch from 50b839e to cb8ba44 Compare October 6, 2025 14:02

fix mypy and make name better

1a62674

gengliangwang reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53802][SDP] Support string values for user-specified schema in SDP tables #52517

[SPARK-53802][SDP] Support string values for user-specified schema in SDP tables #52517

sryza commented Oct 4, 2025 •

edited

Loading

Uh oh!

gengliangwang Oct 7, 2025

Uh oh!

gengliangwang Oct 7, 2025

Uh oh!

Uh oh!

[SPARK-53802][SDP] Support string values for user-specified schema in SDP tables #52517

Are you sure you want to change the base?

[SPARK-53802][SDP] Support string values for user-specified schema in SDP tables #52517

Conversation

sryza commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sryza commented Oct 4, 2025 •

edited

Loading