Parameterize all `@pytest.mark.pipeline` tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Copilot · 2025-08-20T08:16:49Z

This PR parameterizes all pipeline tests to run with both OutputFormat.PANDAS and OutputFormat.EXPERIMENTAL_ARROW output formats, ensuring comprehensive testing of the new experimental Arrow output format across the entire pipeline test suite.

Changes Made

Core Test Infrastructure

Added any_output_format parameter to all test functions that perform data comparisons in pipeline-marked test files
Added lib.set_output_format(any_output_format) calls to ensure tests run with the specified output format
Replaced assert_frame_equal() with assert_frame_equal_with_arrow() for cross-format compatibility
Replaced np.array_equal() and np.testing.assert_array_equal() with assert_frame_equal_with_arrow() where appropriate

Test Files Updated

Core pipeline tests: All files with pytestmark = pytest.mark.pipeline
- test_head.py, test_tail.py, test_aggregation.py, test_projection.py
- test_filtering.py, test_row_range.py, test_resample.py
- test_lazy_dataframe.py, test_symbol_concatenation.py, test_ternary.py
- test_query_builder_sparse.py, test_query_builder_batch.py
Hypothesis tests: test_projection_hypothesis.py, test_filtering_hypothesis.py
Integration tests: Pipeline-marked functions in test_basic_version_store.py

Helper Function Adaptations

Created arrow-compatible versions of generic test helpers:
- aggregation_test_with_any_output_format() for aggregation tests
- filter_test_with_any_output_format() and variants for filtering tests
- row_range_test_with_any_output_format() for row range tests
- resample_test_with_any_output_format() and generic_resample_test_with_arrow_support() for resample tests
Preserved existing helper functions for non-parameterized tests

Special Cases Handled

Class-based tests: Updated test methods in TestQueryBuilderSparse class to accept any_output_format
Skipped tests: Left @pytest.mark.skip tests unchanged to avoid unnecessary modifications
Error-only tests: Tests that only check for exceptions were not modified since they don't test output formats

Testing Impact

With these changes, all pipeline tests now run twice:

Once with OutputFormat.PANDAS (existing behavior)
Once with OutputFormat.EXPERIMENTAL_ARROW (new experimental format)

This provides comprehensive coverage of the new Arrow output format across all pipeline operations including:

Query operations (filtering, projection, aggregation)
Data retrieval (head, tail, row ranges)
Advanced features (resampling, symbol concatenation)
Hypothesis-based property testing

Future Work

As noted in the issue, any tests that fail with internal exceptions when using OutputFormat.EXPERIMENTAL_ARROW should be marked with pytest.xfail in follow-up work once the test failures are identified through CI runs.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

github-actions · 2025-08-20T08:17:34Z

Label error. Requires exactly 1 of: patch, minor, major. Found:

Co-authored-by: IvoDD <[email protected]>

…format fixture Co-authored-by: IvoDD <[email protected]>

…mat fixture Co-authored-by: IvoDD <[email protected]>

…to use any_output_format fixture Co-authored-by: IvoDD <[email protected]>

Co-authored-by: IvoDD <[email protected]>

…tput_format Co-authored-by: IvoDD <[email protected]>

IvoDD · 2025-08-22T07:01:22Z

python/tests/integration/arcticdb/version_store/test_basic_version_store.py

    assert not object_version_store.has_symbol("sym")
    assert object_version_store.list_snapshots() == {snap: None}
-    assert_frame_equal(object_version_store.read("sym", as_of=snap).data, df)
+    assert_frame_equal_with_arrow(object_version_store.read("sym", as_of=snap).data, df)


Do not change tests such as this which are not marked with pytest.mark.pipeline pipeline.
Only this file has per test marks. Other files should be fine.

IvoDD · 2025-08-22T07:37:53Z

python/tests/integration/arcticdb/version_store/test_basic_version_store.py

 def assert_equal_value(data, expected):
    received = data.reindex(sorted(data.columns), axis=1)
    expected = expected.reindex(sorted(expected.columns), axis=1)
-    assert_frame_equal(received, expected)


The above reindex operations will not work on a pyarrow.Table

IvoDD · 2025-08-22T07:38:30Z

python/tests/unit/arcticdb/version_store/test_aggregation.py



-def test_group_on_float_column_with_nans(lmdb_version_store_v1):
+def aggregation_test_with_any_output_format(lib, symbol, df, grouping_column, aggs_dict):


Prefer to modify the generic_aggregation_test as it is only used by pipeline tests.

IvoDD · 2025-08-22T07:39:26Z

python/tests/unit/arcticdb/version_store/test_filtering.py

+        filter_test_with_any_output_format(lib, symbol, arctic_query, expected)
+
+
+def filter_test_nans_with_any_output_format(lib, symbol, arctic_query, expected):


Also prefer to modify the generic_filter_test variants rather than creating new ones here.

IvoDD · 2025-08-22T07:40:49Z

python/tests/unit/arcticdb/version_store/test_lazy_dataframe.py

-def test_lazy_read(lmdb_library):
+def test_lazy_read(lmdb_library, any_output_format):
    lib = lmdb_library
+    lib.set_output_format(any_output_format)


For the lmdb_library fixture we should use lib._nvs.set_output_format instead. It has a different type Library vs for the other tests where the fixture type is NativeVersionStore

IvoDD · 2025-08-22T07:43:06Z

python/tests/unit/arcticdb/version_store/test_resample.py

+    assert_frame_equal_with_arrow(expected, received, check_dtype=False)
+
+
+def generic_resample_test_with_arrow_support(


Again, prefer to modify the generic_resample_test_with_empty_buckets from test.py

Initial plan

90807d0

Copilot AI assigned Copilot and IvoDD Aug 20, 2025

Copilot started work on behalf of IvoDD August 20, 2025 08:16 View session

Copilot AI and others added 6 commits August 20, 2025 08:24

Updated test_head.py and test_tail.py to use any_output_format fixture

8ecc0d7

Co-authored-by: IvoDD <[email protected]>

Updated test_aggregation.py and test_projection.py to use any_output_…

86eac32

…format fixture Co-authored-by: IvoDD <[email protected]>

Updated test_filtering.py and test_row_range.py to use any_output_for…

45d305a

…mat fixture Co-authored-by: IvoDD <[email protected]>

Updated test_query_builder_sparse.py and test_query_builder_batch.py …

1e17ec4

…to use any_output_format fixture Co-authored-by: IvoDD <[email protected]>

Updated test_resample.py to use any_output_format fixture

8130b4a

Co-authored-by: IvoDD <[email protected]>

Completed parameterization of all remaining pipeline tests for any_ou…

8857184

…tput_format Co-authored-by: IvoDD <[email protected]>

Copilot AI requested a review from IvoDD August 20, 2025 08:43

Copilot finished work on behalf of IvoDD August 20, 2025 08:43

IvoDD requested changes Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameterize all `@pytest.mark.pipeline` tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Parameterize all `@pytest.mark.pipeline` tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Uh oh!

Copilot AI commented Aug 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

IvoDD Aug 22, 2025

Uh oh!

Uh oh!



		def test_group_on_float_column_with_nans(lmdb_version_store_v1):
		def aggregation_test_with_any_output_format(lib, symbol, df, grouping_column, aggs_dict):

		filter_test_with_any_output_format(lib, symbol, arctic_query, expected)


		def filter_test_nans_with_any_output_format(lib, symbol, arctic_query, expected):

		assert_frame_equal_with_arrow(expected, received, check_dtype=False)


		def generic_resample_test_with_arrow_support(

Parameterize all @pytest.mark.pipeline tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Are you sure you want to change the base?

Parameterize all @pytest.mark.pipeline tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Uh oh!

Conversation

Copilot AI commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Core Test Infrastructure

Test Files Updated

Helper Function Adaptations

Special Cases Handled

Testing Impact

Future Work

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Parameterize all `@pytest.mark.pipeline` tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Parameterize all `@pytest.mark.pipeline` tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats #2597

Copilot AI commented Aug 20, 2025 •

edited

Loading