Skip to content

Conversation

@Bellangelo
Copy link
Contributor

@Bellangelo Bellangelo commented Dec 28, 2024

Change Log

Added

  • FlowTestCase with assertions around Extractors

Fixed

Changed

Removed

Deprecated

Security


Description

Partially addresses #745.

I found some duplicated logic in the Extractor's tests and I decided to unify the logic around them.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 28, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.715mb +0.01%  | 521.109ms +0.28% | ±0.55% +24.22%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.819mb +0.01%  | 1.113s +0.65%    | ±1.23% +390.43% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.465mb +0.00% | 915.003ms +0.51% | ±0.62% +107.03% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.448mb +0.01%  | 34.110ms +0.71%  | ±0.46% +411.17% |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.424mb +0.01%  | 664.091ms -1.22% | ±0.38% -58.77%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 114.444mb +0.00% | 61.789ms +0.20% | ±0.40% -48.45% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+------------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev           |
+--------------------+----------------+------+-----+------------------+------------------+------------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 55.188mb +0.00%  | 113.640ms +1.64% | ±1.49% +145.91%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 85.655mb +0.00%  | 103.892ms +0.45% | ±1.08% -0.05%    |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 167.441mb +0.00% | 19.427s +0.40%   | ±2.51% +811.25%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.273mb +0.00%  | 44.292ms -0.03%  | ±0.99% +1434.08% |
+--------------------+----------------+------+-----+------------------+------------------+------------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 85.501mb +0.00%  | 3.228ms +3.92%   | ±1.91% +107.26%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.770mb +0.00% | 188.194ms -1.48% | ±0.73% -51.93%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.490mb +0.00%  | 18.518ms -0.69%  | ±0.39% -42.25%   |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 86.376mb +0.00%  | 1.530ms +9.89%   | ±2.90% +467.87%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 86.376mb +0.00%  | 1.446ms +2.55%   | ±0.78% -71.51%   |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 84.537mb +0.00%  | 2.238ms -2.01%   | ±2.68% +6.80%    |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.066mb +0.00%  | 15.003ms +1.79%  | ±0.74% +75.41%   |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.066mb +0.00%  | 15.195ms +0.05%  | ±0.43% -27.15%   |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.757mb +0.00%  | 1.800μs +5.51%   | ±0.00% -100.00%  |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.757mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%     |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.334mb +0.00%  | 12.821ms -1.39%  | ±0.47% -27.54%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 120.642mb +0.00% | 61.895ms +1.27%  | ±0.81% +469.19%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 85.585mb +0.00%  | 1.253ms +5.32%   | ±1.34% -27.45%   |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 88.874mb +0.00%  | 60.444ms -0.13%  | ±1.17% +181.73%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 86.638mb +0.00%  | 3.496ms +0.33%   | ±1.58% -22.13%   |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 84.032mb +0.00%  | 40.124ms -2.12%  | ±0.70% +37.37%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 84.033mb +0.00%  | 40.688ms -2.40%  | ±1.46% +473.39%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 84.032mb +0.00%  | 42.502ms +3.88%  | ±3.22% +78.28%   |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.197mb +0.00%  | 7.737ms -0.62%   | ±0.37% +34.67%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.947mb +0.00%  | 29.938ms +0.14%  | ±2.63% +219.18%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.757mb +0.00%  | 13.334μs +2.57%  | ±1.55% +0.00%    |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.757mb +0.00%  | 15.018μs -1.84%  | ±0.94% +75.50%   |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.775mb +0.00% | 190.546ms -0.16% | ±0.48% -33.86%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 50.655mb +0.00%  | 400.050ms -0.33% | ±0.22% -53.56%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 13.071mb +0.00%  | 82.910ms +3.13%  | ±2.53% +1958.36% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.913mb +0.00% | 485.501ms +1.37% | ±0.56% +143.47%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.104mb +0.00%  | 244.681ms +0.55% | ±1.18% +15.17%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.626mb +0.00%  | 53.150ms +0.25%  | ±2.60% +137.38%  |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech
Copy link
Member

hey @Bellangelo it's a great start!
After looking at your proposal few things comes to my mind.

  • focus that helper class on assertions (no helper methods)
  • name it FlowTestCase
  • it should be a part of the src/core/etl
  • I would try to put it under src/core/etl/src/Flow/ETL/Test namespace to make it available with the ETL

and ideally, I would probably try to make all Flow test cases to extend from it, at least across those that depends on flow-php/etl

@norberttech
Copy link
Member

also @Bellangelo feel free to start a thread at discord if you have any questions about this task.
We can talk in more details about it there if you prefer more live time communication

@codecov
Copy link

codecov bot commented Dec 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (1.x@4c533ea). Learn more about missing BASE report.
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@          Coverage Diff           @@
##             1.x    #1291   +/-   ##
======================================
  Coverage       ?   80.81%           
======================================
  Files          ?      583           
  Lines          ?    16037           
  Branches       ?        0           
======================================
  Hits           ?    12960           
  Misses         ?     3077           
  Partials       ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added size: M and removed size: S labels Dec 28, 2024
@github-actions github-actions bot added size: S and removed size: M labels Dec 28, 2024
@Bellangelo
Copy link
Contributor Author

@norberttech I think I have finished the method changes. The only thing I haven't done is this one:

  • I would try to put it under src/core/etl/src/Flow/ETL/Test namespace to make it available with the ETL

Is the namespace correct or you maybe meant to write src/core/etl/tests/Flow/ETL/Tests? The reason I am asking is that it concerns me that we will have a class in the "main" src that depends in a dev dependency, the PHPUnit. Maybe I am missing something?

@norberttech
Copy link
Member

@Bellangelo, thank you for your work, and I'm sorry for the lack of proper guidance. Let me explain the vision once again and update the issue.

So in general my goal was to provide a FlowTestCase class that people could use to test their pipelines.
I would focus this one on:
What you started to build here is more of a candidate for the internal use cases (because regular users should not write their extractors/loaders).

We already have IntegrationTestCase which is doing something similar to what you are trying to achieve here.

We can approach it in a following way :

InternalTestCase (those should sit under src/core/etl/tests/Flow/ETL/Tests)

  • FlowTestCase - the one that you started building, should be used with all Unit Tests
  • FlowIntegrationTestCase that extends FlowTestCase - this could be the one we already have under IntegrationTestCase

ExternalTestCase

  • FlowTestCase

This one should be exposed with the ETL itself or maybe we can put it under a dedicated package like flow-php/bridge-phpunit with some dedicated helpers that TestCases for things like ScalarFunctions for example.

So what you did so far is a good start to unify InternalTestCases across the whole repository and if you don't mind I would like you to continue with this one.

I'm going to leave some comments in the code directly.

Please let me know if that makes sense and again, feel free to reach out on Discord if you prefer a more synchronous way of communication

@github-actions github-actions bot added size: M and removed size: S labels Dec 29, 2024
@Bellangelo
Copy link
Contributor Author

@Bellangelo, thank you for your work, and I'm sorry for the lack of proper guidance.

I guess we are even due to how bad I am consistently naming these assertions 😃 .

InternalTestCase (those should sit under src/core/etl/tests/Flow/ETL/Tests)

  • FlowTestCase - the one that you started building, should be used with all Unit Tests
  • FlowIntegrationTestCase that extends FlowTestCase - this could be the one we already have under IntegrationTestCase

Done in 1e1c2fe and 6b300a0. I haven't updated all the tests to extend the new FlowTestCase since it isn't required by all the classes. Plus, it is a 100+ file changes if I do this. Although if you think we should do it, I can happily open a new PR for it after we merge this one.

@norberttech norberttech merged commit 12357ba into flow-php:1.x Dec 29, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants