Skip to content

Conversation

@QuentinI
Copy link
Collaborator

@QuentinI QuentinI commented Aug 25, 2025

Closes #<ISSUE_NUMBER>

This PR:

  • Adds support for running batcher in TEE to the devnet test helpers
  • Uses it to support testing restarting TEE batcher in the batcher restart tests

This PR does not:

Key places to review:


Base automatically changed from sishan/devnet-batcher-tee to celo-integration-rebase-13.2 August 29, 2025 22:19
@QuentinI QuentinI force-pushed the ag/enclave-restart-test branch from 1ce717f to 9051371 Compare September 3, 2025 09:10
@QuentinI QuentinI changed the title [WIP] enclave restart test TA4: Enclave restart test Sep 3, 2025
@QuentinI QuentinI marked this pull request as ready for review September 4, 2025 13:32
Copy link
Collaborator

@dailinsubjam dailinsubjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@philippecamacho
Copy link
Collaborator

Why is it possible to run the test locally even without a TEE?

@dailinsubjam
Copy link
Collaborator

@philippecamacho If you're not on a enclave-enabled instance, you'll hit this line and it will automatically skip enclave-related operations.

Comment on lines +14 to +15
testRestart(t, false)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized to restart op-batcher-tee we not only need profile to be tee but also need things like restarting the service op-batcher-tee specifically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a big catch for sure, thanks. No idea how it passed for me in the first place.

@philippecamacho
Copy link
Collaborator

philippecamacho commented Sep 10, 2025

@philippecamacho If you're not on a enclave-enabled instance, you'll hit this line and it will automatically skip enclave-related operations.

I see, but when this test runs in CI, is the batcher executed inside the TEE?

@QuentinI
Copy link
Collaborator Author

I see, but when this test runs in CI, is the batcher executed inside the TEE?

No, it isn't. Only non-tee test runs.

@QuentinI QuentinI force-pushed the ag/enclave-restart-test branch 2 times, most recently from 4bf678e to 3a01ee2 Compare September 29, 2025 19:29
@QuentinI QuentinI force-pushed the ag/enclave-restart-test branch 3 times, most recently from 74b5abf to 5475f6d Compare October 15, 2025 17:15
@QuentinI
Copy link
Collaborator Author

@dailinsubjam @philippecamacho this works now, I've added a devnet-enclave-tests target to justfile which you can run on an AWS Nitro instance to verify (don't forget you need to be in nix shell).

Not set up to run in CI unfortunately, building dockers on AWS machine takes too long and GitHub kills the action. To run this in CI, we need to set up building dockers in one action and re-using in subsequent ones, including uploading them to the AWS Nitro instance, which I'd argue out of scope for this PR, LMK if you disagree.

dailinsubjam
dailinsubjam previously approved these changes Oct 15, 2025
Copy link
Collaborator

@dailinsubjam dailinsubjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEnclaveRestart pass for me.
But docker ps after the test, I can see

docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS          PORTS     NAMES
224fe2454142   op-proposer:espresso     "/bin/entrypoint.sh …"   13 minutes ago   Up 10 minutes             espresso-op-proposer-1
78fefd376a3b   op-challenger:espresso   "/bin/entrypoint.sh …"   13 minutes ago   Up 10 minutes             espresso-op-challenger-1
099c18b6b26a   op-batcher:espresso      "op-batcher --espres…"   13 minutes ago   Up 10 minutes             espresso-op-batcher-1

remains running.
We can open a separate ticket for cleaning them up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we’ve added a backup of the backup of the original… bold move 😎

Copy link
Collaborator Author

@QuentinI QuentinI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh heck, this is from mergiraf, I'll remove it stat 🤣

Comment on lines +291 to 313
func (d *Devnet) ServiceDown(service Service) error {
serviceName := d.getServiceName(service)
log.Info("shutting down service", "service", serviceName)
cmd := exec.CommandContext(
d.ctx,
"docker", "compose", "down", service,
"docker", "compose", "--profile", d.getProfile(), "down", serviceName,
)
return cmd.Run()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of one thing: op-batcher-tee container spawns an enclave container (batcher-enclaver-xxx) that doesn’t stop automatically when op-batcher-tee stops. In Docker Compose I’ve been cleaning it up manually via espresso/scripts/shutdown.sh, but that’s probably not the best approach.
I'm thinking we may see the same issue here, even if op-batcher-tee exits, batcher-enclaver-xxx may keep running.
One possible way is to add a cleanup hook when shutting downop-batcher-tee (not sure whether it's supported) or also add a manual shutdown here. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. I'll see what we can do here

Copy link
Collaborator

@dailinsubjam dailinsubjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I just found tee is not triggered automatically though I'm on a AWS Nitro, will take a further look.

@dailinsubjam dailinsubjam dismissed their stale review October 15, 2025 21:23

tee is not triggered in my run for the test

@QuentinI
Copy link
Collaborator Author

@dailinsubjam I've mixed up order of operations during rebase, my tee flag was set after the devnet was spun up 🤦
Fixed it, going to re-run the thing now and see if it fixes the leftover containers (it should 🤞 )

@dailinsubjam dailinsubjam self-requested a review October 16, 2025 13:09
@dailinsubjam
Copy link
Collaborator

I got this

op-geth-sequencer-1  | INFO [10-16|15:47:27.716] Persisted trie from memory database      nodes=1542 size=170.63KiB time=3.279752ms    gcnodes=5765 gcsiz
e=1.63MiB gctime=11.571326ms livenodes=3207 livesize=978.01KiB
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Writing cached state to disk             block=366 hash=aaa89c..41bd2d root=5e5706..bcf590
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Persisted trie from memory database      nodes=25   size=7.76KiB   time="116.902µs"   gcnodes=0    gcsiz
e=0.00B   gctime=0s          livenodes=3182 livesize=970.25KiB
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Writing cached state to disk             block=240 hash=b04582..df2713 root=67d352..32720e
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Persisted trie from memory database      nodes=269  size=42.88KiB  time="734.222µs"   gcnodes=0    gcsiz
e=0.00B   gctime=0s          livenodes=2913 livesize=927.37KiB
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Writing snapshot state to disk           root=1751bc..130f87
op-geth-sequencer-1  | INFO [10-16|15:47:27.717] Persisted trie from memory database      nodes=0    size=0.00B     time="2.68µs"      gcnodes=0    gcsiz
e=0.00B   gctime=0s          livenodes=2913 livesize=927.37KiB
op-geth-sequencer-1  | INFO [10-16|15:47:27.721] Blockchain stopped
l1-geth-1            | INFO [10-16|15:47:28.004] Starting work on payload                 id=0x03b6dac60f0c1813
l1-geth-1            | INFO [10-16|15:47:28.004] Updated payload                          id=0x03b6dac60f0c1813 number=246 hash=e17a7f..15b198 txs=0 with
drawals=0 gas=0          fees=0           root=8c906e..654623 elapsed="132.132µs"
espresso-dev-node-1 exited with code 143
l1-validator-1 exited with code 0
l1-beacon-1          | Oct 16 15:47:28.614 INFO  Shutting down..                               reason: Success("Received SIGTERM") 
l1-beacon-1          | Oct 16 15:47:28.614 INFO  Saved DHT state                              service: "network"
l1-beacon-1          | Oct 16 15:47:28.614 INFO  Network service shutdown                     service: "network"
l1-beacon-1          | Oct 16 15:47:28.618 INFO  Saved beacon chain to disk                   service: "network"
op-geth-sequencer-1 exited with code 0 
l1-beacon-1 exited with code 0
l1-geth-1 exited with code 137
--- FAIL: TestEnclaveRestart (778.84s) 
FAIL
FAIL    github.com/ethereum-optimism/optimism/espresso/devnet-tests     788.666s
FAIL
error: Recipe `devnet-enclave-tests` failed on line 47 with exit code 1

Started in tmux so I didn't fetch the full log TAT I can fetch the complete logs with a new run later

@QuentinI QuentinI force-pushed the ag/enclave-restart-test branch from e76deaa to f961472 Compare October 24, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants