Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Aug 20, 2025

This PR parameterizes all pipeline tests to run with both OutputFormat.PANDAS and OutputFormat.EXPERIMENTAL_ARROW output formats, ensuring comprehensive testing of the new experimental Arrow output format across the entire pipeline test suite.

Changes Made

Core Test Infrastructure

  • Added any_output_format parameter to all test functions that perform data comparisons in pipeline-marked test files
  • Added lib.set_output_format(any_output_format) calls to ensure tests run with the specified output format
  • Replaced assert_frame_equal() with assert_frame_equal_with_arrow() for cross-format compatibility
  • Replaced np.array_equal() and np.testing.assert_array_equal() with assert_frame_equal_with_arrow() where appropriate

Test Files Updated

  • Core pipeline tests: All files with pytestmark = pytest.mark.pipeline
    • test_head.py, test_tail.py, test_aggregation.py, test_projection.py
    • test_filtering.py, test_row_range.py, test_resample.py
    • test_lazy_dataframe.py, test_symbol_concatenation.py, test_ternary.py
    • test_query_builder_sparse.py, test_query_builder_batch.py
  • Hypothesis tests: test_projection_hypothesis.py, test_filtering_hypothesis.py
  • Integration tests: Pipeline-marked functions in test_basic_version_store.py

Helper Function Adaptations

  • Created arrow-compatible versions of generic test helpers:
    • aggregation_test_with_any_output_format() for aggregation tests
    • filter_test_with_any_output_format() and variants for filtering tests
    • row_range_test_with_any_output_format() for row range tests
    • resample_test_with_any_output_format() and generic_resample_test_with_arrow_support() for resample tests
  • Preserved existing helper functions for non-parameterized tests

Special Cases Handled

  • Class-based tests: Updated test methods in TestQueryBuilderSparse class to accept any_output_format
  • Skipped tests: Left @pytest.mark.skip tests unchanged to avoid unnecessary modifications
  • Error-only tests: Tests that only check for exceptions were not modified since they don't test output formats

Testing Impact

With these changes, all pipeline tests now run twice:

  1. Once with OutputFormat.PANDAS (existing behavior)
  2. Once with OutputFormat.EXPERIMENTAL_ARROW (new experimental format)

This provides comprehensive coverage of the new Arrow output format across all pipeline operations including:

  • Query operations (filtering, projection, aggregation)
  • Data retrieval (head, tail, row ranges)
  • Advanced features (resampling, symbol concatenation)
  • Hypothesis-based property testing

Future Work

As noted in the issue, any tests that fail with internal exceptions when using OutputFormat.EXPERIMENTAL_ARROW should be marked with pytest.xfail in follow-up work once the test failures are identified through CI runs.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copy link

Label error. Requires exactly 1 of: patch, minor, major. Found:

@Copilot Copilot AI changed the title [WIP] Parameterize all @pytest.mark.pipeline tests to run both with output_format=OutputFormat.PANDAS and output_format=OutputFormat.EXPERIMENTAL_ARROW. The tests inside test_query_builder.py already do that via the any_output_format fixture. Do si... Parameterize all @pytest.mark.pipeline tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats Aug 20, 2025
@Copilot Copilot AI requested a review from IvoDD August 20, 2025 08:43
Copilot finished work on behalf of IvoDD August 20, 2025 08:43
assert not object_version_store.has_symbol("sym")
assert object_version_store.list_snapshots() == {snap: None}
assert_frame_equal(object_version_store.read("sym", as_of=snap).data, df)
assert_frame_equal_with_arrow(object_version_store.read("sym", as_of=snap).data, df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change tests such as this which are not marked with pytest.mark.pipeline pipeline.
Only this file has per test marks. Other files should be fine.

def assert_equal_value(data, expected):
received = data.reindex(sorted(data.columns), axis=1)
expected = expected.reindex(sorted(expected.columns), axis=1)
assert_frame_equal(received, expected)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above reindex operations will not work on a pyarrow.Table



def test_group_on_float_column_with_nans(lmdb_version_store_v1):
def aggregation_test_with_any_output_format(lib, symbol, df, grouping_column, aggs_dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to modify the generic_aggregation_test as it is only used by pipeline tests.

filter_test_with_any_output_format(lib, symbol, arctic_query, expected)


def filter_test_nans_with_any_output_format(lib, symbol, arctic_query, expected):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also prefer to modify the generic_filter_test variants rather than creating new ones here.

def test_lazy_read(lmdb_library):
def test_lazy_read(lmdb_library, any_output_format):
lib = lmdb_library
lib.set_output_format(any_output_format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the lmdb_library fixture we should use lib._nvs.set_output_format instead. It has a different type Library vs for the other tests where the fixture type is NativeVersionStore

assert_frame_equal_with_arrow(expected, received, check_dtype=False)


def generic_resample_test_with_arrow_support(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, prefer to modify the generic_resample_test_with_empty_buckets from test.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants