Skip to content

fix: parallel execution dependency graph for exec, start_service, stop_service#2930

Open
mattevans wants to merge 1 commit intokurtosis-tech:mainfrom
mattevans:fix/parallel-dependency-graph
Open

fix: parallel execution dependency graph for exec, start_service, stop_service#2930
mattevans wants to merge 1 commit intokurtosis-tech:mainfrom
mattevans:fix/parallel-dependency-graph

Conversation

@mattevans
Copy link

Description

  • exec, start_service, and stop_service now call ProducesService() in the dependency graph
  • Previously these instructions mutated service state without declaring it, so store_service_files on the same service could race with a preceding exec in --parallel mode
  • Updates dependency graph test expectations to match

Error:

There was an error executing Starlark code
One or more instructions failed to execute in parallel. This if the first error that was found.
  Caused by: An error occurred executing instruction (number 41) at github.com/ethpandaops/ethereum-package/src/prelaunch_data_generator/validator_keystores/validator_keystore_generator.star[353:60]:
  store_service_files(service_name="validator-key-generation-cl-validator-keystore-0", src="/tmp/prysm-password.txt", name="prysm-password")
  Caused by: Failed to copy file '/tmp/prysm-password.txt' from service 'validator-key-generation-cl-validator-keystore-0
  Caused by: There was an error in copying files over to disk
  Caused by: An error occurred gzip'ing and pushing tar'd file bytes to the pipe
  Caused by: An error occurred copying source '/tmp/prysm-password.txt' from user service with UUID 'f8039609f0f04032ac42dc83ed8a480e' in enclave with UUID '4e7524b6fa1a4cf390164f290754165b'
  Caused by: An error occurred copying files from sourcepath '/tmp/prysm-password.txt' in user service with UUID 'f8039609f0f04032ac42dc83ed8a480e' in enclave with UUID '4e7524b6fa1a4cf390164f290754165b'
  Caused by: An error occurred copying content from sourcepath '/tmp/prysm-password.txt' in container '/validator-key-generation-cl-validator-keystore-0--f8039609f0f04032ac42dc83ed8a480e' for user service 'f8039609f0f04032ac42dc83ed8a480e' in enclave '4e7524b6fa1a4cf390164f290754165b'
  Caused by: an error occurred while verifying whether the file was a folder
  Caused by: request returned Not Found for API route and version http://%2Fvar%2Frun%2Fdocker.sock/v1.44/containers/4ea0e427bff63eec49b8590af571ce5c147d6ea09b9788255dcf8f07da6e1e95/archive?path=%2Ftmp%2Fprysm-password.txt, check if the server supports the requested API version
Error encountered running Starlark code.

ethereum-package config for reproducing:

participants:
  - el_type: geth
    cl_type: prysm
  - el_type: reth
    cl_type: teku
  - el_type: nethermind
    cl_type: nimbus
  - el_type: geth
    cl_type: teku
  - el_type: reth
    cl_type: lighthouse
  - el_type: nethermind
    cl_type: nimbus
  - el_type: geth
    cl_type: nimbus
  - el_type: reth
    cl_type: teku
  - el_type: nethermind
    cl_type: prysm
network_params:
  preset: minimal
  genesis_delay: 5
  electra_fork_epoch: 0
  fulu_fork_epoch: 18446744073709551615
parallel_keystore_generation: true
additional_services:
  - dora
  - spamoor
dora_params:
  image: "ethpandaops/dora:latest"
spamoor_params:
  image: "ethpandaops/spamoor:latest"
  max_mem: 4000
  spammers:
    - scenario: eoatx
      config:
        throughput: 200
    - scenario: blobs
      config:
        throughput: 20
kurtosis run --enclave your-enclave-name --image-download missing --args-file multi-client.yaml --parallel --parallelism 10 .

Is this change user facing?

YES - fixes race conditions when using --parallel

…el execution

exec, start_service, and stop_service instructions mutate service state but
weren't calling ProducesService() in the dependency graph. This meant
store_service_files on the same service could race with a preceding exec,
causing failures.

Now exec, start_service, and stop_service all produce their service in the
dependency graph, ensuring proper ordering with consumers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant