Skip to content

Conversation

agologan
Copy link
Contributor

@agologan agologan commented Jul 27, 2025

🤔 What's changed?

⚡️ What's your motivation?

Closes #2303

🏷️ What kind of change is this?

  • 📖 Documentation (improvements without changing code)
  • ⚡ New feature (non-breaking change which adds new behaviour)

♻️ Anything particular you want feedback on?

Feature is implemented as plugin on event pickles:filter and will run between filter and order plugins.
As such defined order is not affected by sharding and random order will only shuffle tests in a shard.

Alternatively, a new event pickles:shard could be introduced to be executed after pickles:order which would allow a global random seed. This will require a documentation change to warn users they should use the same seed across shards.

The option was added to ISourcesCoordinates as it acts more like a filtering option for a specific instance rather than how --parallel behaves and the plugin was loaded only for runCucumber.

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

This text was originally generated from a template, then edited by hand. You can modify the template here.

@coveralls
Copy link

coveralls commented Jul 27, 2025

Coverage Status

coverage: 97.792% (+0.01%) from 97.78%
when pulling 92ba787 on agologan:main
into cf6ab33 on cucumber:main.

Copy link
Contributor

@davidjgoss davidjgoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for contributing this, it's a great quality PR.

Feature is implemented as plugin on event pickles:filter and will run between filter and order plugins.
As such defined order is not affected by sharding and random order will only shuffle tests in a shard.

Alternatively, a new event pickles:shard could be introduced to be executed after pickles:order which would allow a global random seed. This will require a documentation change to warn users they should use the same seed across shards.

I think what you've done is the best balance - no sense in adding another event just for this.

@davidjgoss davidjgoss merged commit c79fe2d into cucumber:main Aug 16, 2025
8 checks passed
@Tallyb
Copy link

Tallyb commented Aug 22, 2025

From looking at the code, I coudl not tell how the pickles are sorted. If this is a random sort, based on how they are read from the file system, this will lead to errors. You must sort all pickles in a unique way. As a side note, since you use modulo, sorting by file size and then alphabetically will give better performance compared to alphabetically only.

@davidjgoss
Copy link
Contributor

@Tallyb can you elaborate on what errors you are foreseeing? Are you able to reproduce them in a sample project?

This PR doesn't deal with any sorting - that is handled in a later phase, after filtering.

@Tallyb
Copy link

Tallyb commented Aug 23, 2025

let's take a simple example:
Assume we have tests A,B,C,D, E, F, G and we run them on shard = 3.
Each shard is running in a different machine, so it is completely unaware of what is happening on the other machines.

Machine 1: tests are read in the following order - A, B, C, D, E, F, G. tests taken - A, D, G. (positions: 1, 4, 7)
Machine 2: tests are read in the following order - B, A, C, D, E, F, G. tests taken: A, E, (2, 5)
Machine 2: tests are read in the following order - B, C, A, D, E, F, G. tests taken: A, F, (3, 6)
As you can see, A is tested 3 times, while B and C are not tested at all. to make sure everything is tested, you must guarantee the same order in which tests are being read.

@agologan
Copy link
Contributor Author

The code in all instances does 3 things in this order:

  • discovers paths
  • filters the paths
  • applies ordering (if not default: defined)

Can see this is load_sources.ts or run_cucumber.ts

Sharding is applied after discovery but before further ordering.

So discovery is responsible for the defined order and that can be seen in paths.ts

Getting back to your example, all 3 machines run the exact same code to discover
the paths which as the docs put it "roughly means alphabetical order of file path
followed by sequential order within each file".

So all 3 machines will discover the tests in an order which roughly resembles
A, B, C, D, E, F, G order, and is the same across them.

Then all 3 machines will apply the same filtering logic, which would leave them
with the same sequence, to which sharding is applied.

Picking nth test on each shard will provide uniform distribution of all tests afterwards.

As a last step if a random order is used, those left on each shard are reshuffled.

@davidjgoss
Copy link
Contributor

Thanks for laying that out @agologan.

As you say, the initial order of sources and pickles is deterministic given the same configuration, so I don't believe we have a problem here.

@Tallyb
Copy link

Tallyb commented Aug 25, 2025

Ok, as long as it is deterministic, there is no issue.
You can use my tip for improving performance by sorting by size (assuming size ~= execution time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve execution time by executing on multiple machines
4 participants