Skip to content

NO SNOW: fix bug in xml read when custom schema is non structtype.#4065

Merged
sfc-gh-yuwang merged 6 commits intomainfrom
no-snow-yuwang-fix-xml-custom-schema
Feb 4, 2026
Merged

NO SNOW: fix bug in xml read when custom schema is non structtype.#4065
sfc-gh-yuwang merged 6 commits intomainfrom
no-snow-yuwang-fix-xml-custom-schema

Conversation

@sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Feb 2, 2026

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

when reading a xml file like this:
<test>
<num>1</num>
<str1>NULL</str1>
<str2></str2>
<str3 id="empty">xxx</str3>
</test>

with code:
test_schema = StructType([
StructField("num", IntegerType(), True),
StructField("str1", StringType(), True),
StructField("str2", StringType(), True),
StructField("str3", StringType(), True),
])

df = (
spark.read
.option("rowTag", "test")
.schema(test_schema)
.xml("<file path>")
)
print(df.collect())

pyspark give you:
[Row(num=1, str1='NULL', str2='', str3='xxx')]
while currently snowpark would give you:
[Row(num=1, str1='NULL', str2='', str3='{"_VALUE": "xxx", "_id": "empty"}']

indicating that current xml custom schema does not process element with attributes + schema is not structtype correctly. this PR is meant to fix this bug

CHANGELOG.md Outdated
#### Bug Fixes

- Fixed a bug that opentelemetry is not correctly import when using `Session.client_telemetry.enable_event_table_telemetry_collection`.
- Fixed a bug when reading xml with custom schema, result include element attributes when column is not `StructType` type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please move to 1.46.0

Removed duplicate bug fix entry for XML reading issue and updated improvements section.
@sfc-gh-yuwang sfc-gh-yuwang merged commit 3c5cbd0 into main Feb 4, 2026
28 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the no-snow-yuwang-fix-xml-custom-schema branch February 4, 2026 22:11
@github-actions github-actions bot locked and limited conversation to collaborators Feb 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants