[BUG] Spark Execution Engine: unexpected_index_list not returned in GX 1.x even with unexpected_index_column_names configured


When using the Spark Execution Engine in Great Expectations 1.x, the validation result does not include [unexpected_index_list] even when unexpected_index_column_names is properly configured in the [result_format]

This is a regression from GX 0.17.x, where these fields were returned correctly and contained full row data with the specified index columns.

```
import great_expectations as gx
from great_expectations.expectations import ExpectColumnValuesToNotBeNull
from pyspark.sql import SparkSession

# Create Spark session and sample data
spark = SparkSession.builder.getOrCreate()
data = [
    ("ABC", "Broker A"),
    ("XYZ", None),      # This should fail - null broker_name
    ("MNP", None),      # This should fail - null broker_name
]
df = spark.createDataFrame(data, ["broker_code", "broker_name"])

# Setup GX 1.x context
context = gx.get_context(mode="ephemeral")
datasource = context.data_sources.add_spark(name="my_spark_datasource")
data_asset = datasource.add_dataframe_asset(name="my_data_asset")
batch_definition = data_asset.add_batch_definition_whole_dataframe(name="my_batch")

# Create expectation with result_format including unexpected_index_column_names
suite = gx.ExpectationSuite(name="my_suite")
suite.add_expectation(
    ExpectColumnValuesToNotBeNull(
        column="broker_name",
        result_format={
            "result_format": "COMPLETE",
            "unexpected_index_column_names": ["broker_code"]  # <-- Should return broker_code in results
        }
    )
)
suite = context.suites.add(suite)

# Run validation
validation_definition = gx.ValidationDefinition(
    name="my_validation",
    data=batch_definition,
    suite=suite,
)
validation_definition = context.validation_definitions.add(validation_definition)

checkpoint = gx.Checkpoint(
    name="my_checkpoint",
    validation_definitions=[validation_definition],
)
checkpoint = context.checkpoints.add(checkpoint)

result = checkpoint.run(batch_parameters={"dataframe": df})
validation_result = list(result.run_results.values())[0]

# Print the result
import json
print(json.dumps(validation_result.to_json_dict(), indent=2, default=str))
```


==========
Expected behavior
GX 0.17.x returned (correct):

"result": {
  "element_count": 3,
  "unexpected_count": 2,
  "unexpected_percent": 66.67,
  "partial_unexpected_list": [null, null],
  "partial_unexpected_index_list": [
    {"broker_code": "XYZ", "broker_name": null},
    {"broker_code": "MNP", "broker_name": null}
  ],
  "unexpected_index_list": [
    {"broker_code": "XYZ", "broker_name": null},
    {"broker_code": "MNP", "broker_name": null}
  ],
  "unexpected_index_query": "df.filter(F.expr(NOT (broker_name IS NOT NULL)))"
}


The [unexpected_index_list] column specified in unexpected_index_column_names, allowing us to identify which specific rows failed validation.

Actual behavior
GX 1.11.0 returns (missing index lists):

"result": {
  "element_count": 100,
  "unexpected_count": 15,
  "unexpected_percent": 15.0,
  "partial_unexpected_list": [null, null, null, ...],
  "partial_unexpected_counts": [{"value": null, "count": 15}]
}


The following fields are completely missing:

-  [unexpected_index_list]
- [partial_unexpected_index_list]
- unexpected_list
- unexpected_index_query

This makes it impossible to identify which specific rows failed validation when using Spark DataFrames.

Impact
This is a breaking change for users migrating from GX 0.17.x to 1.x who rely on [unexpected_index_list] to

- Track which specific records failed data quality checks 
- Build failure reports with row-level identifiers
- Join failure data back to source tables for remediation

Environment
Great Expectations Version: 1.11.0
Execution Engine: Spark (PySpark)
Previously working in: 0.17.19
Python Version: 3.10
Operating System: Linux (AWS Glue)

Additional context
The unexpected_index_column_names configuration is being passed correctly to the expectation (visible in [expectation_config.kwargs.result_format], but the Spark execution engine is not honoring it and not returning the index data in the results.

This may be related to changes in metric Computation for spark in GX 1.x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Spark Execution Engine: unexpected_index_list not returned in GX 1.x even with unexpected_index_column_names configured #11647

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Spark Execution Engine: unexpected_index_list not returned in GX 1.x even with unexpected_index_column_names configured #11647

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions