-
Notifications
You must be signed in to change notification settings - Fork 706
[Test][HistoryServer] E2E test for dead cluster actor endpoint #4461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
4e509bc
ca8e37a
4f3229b
b74fcee
2a57fce
73ecf85
cf5b4c9
3300afd
2d873d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,8 +19,9 @@ import ( | |||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| const ( | ||||||||||||||||||||||||||
| LiveSessionName = "live" | ||||||||||||||||||||||||||
| EndpointLogFile = "/api/v0/logs/file" | ||||||||||||||||||||||||||
| LiveSessionName = "live" | ||||||||||||||||||||||||||
| EndpointLogFile = "/api/v0/logs/file" | ||||||||||||||||||||||||||
| EndpointLogicalActors = "/logical/actors" | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| func TestHistoryServer(t *testing.T) { | ||||||||||||||||||||||||||
|
|
@@ -43,6 +44,10 @@ func TestHistoryServer(t *testing.T) { | |||||||||||||||||||||||||
| name: "/v0/logs/file endpoint (dead cluster)", | ||||||||||||||||||||||||||
| testFunc: testLogFileEndpointDeadCluster, | ||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||
| name: "/logical/actors endpoint (dead cluster)", | ||||||||||||||||||||||||||
| testFunc: testLogicalActorsEndpointDeadCluster, | ||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| for _, tt := range tests { | ||||||||||||||||||||||||||
|
|
@@ -282,3 +287,157 @@ func testLogFileEndpointDeadCluster(test Test, g *WithT, namespace *corev1.Names | |||||||||||||||||||||||||
| DeleteS3Bucket(test, g, s3Client) | ||||||||||||||||||||||||||
| LogWithTimestamp(test.T(), "Dead cluster log file endpoint tests completed") | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // testLogicalActorsEndpointDeadCluster verifies that the history server can return actors from the | ||||||||||||||||||||||||||
| // in-memory ClusterActorMap after a cluster is deleted. | ||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||
| // Data flow explanation: | ||||||||||||||||||||||||||
| // The history server does not fetch actors directly from S3. Instead: | ||||||||||||||||||||||||||
| // 1. Collector pushes events to S3 on cluster deletion | ||||||||||||||||||||||||||
| // 2. Storage Reader reads event files from S3 | ||||||||||||||||||||||||||
| // 3. Event Handler processes events and populates ClusterActorMap | ||||||||||||||||||||||||||
| // 4. The /logical/actors endpoint returns actors from the in-memory ClusterActorMap | ||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||
| // The test case follows these steps: | ||||||||||||||||||||||||||
| // 1. Prepare test environment by applying a Ray cluster | ||||||||||||||||||||||||||
| // 2. Submit a Ray job to the existing cluster (generates actor events) | ||||||||||||||||||||||||||
| // 3. Delete RayCluster to trigger log upload to S3 (and event processing) | ||||||||||||||||||||||||||
| // 4. Apply History Server and get its URL | ||||||||||||||||||||||||||
| // 5. Verify that the history server returns actors via /logical/actors endpoint | ||||||||||||||||||||||||||
| // 6. Verify that the history server returns a single actor via /logical/actors/{actor_id} endpoint | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| // 6. Verify that the history server returns a single actor via /logical/actors/{actor_id} endpoint | |
| // 6. Verify that the history server returns actors via /logical/actors endpoint, returns a single actor via /logical/actors/{actor_id} endpoint, and handles non-existent actor queries appropriately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined deletion duplicates existing DeleteRayClusterAndWait helper
Low Severity
The new testLogicalActorsEndpointDeadCluster function manually inlines the RayCluster deletion and wait logic (delete, expect no error, log, Eventually wait for IsNotFound), which is exactly what the existing DeleteRayClusterAndWait helper in historyserver.go already does. The adjacent testLogFileEndpointDeadCluster test uses DeleteRayClusterAndWait for the same purpose, making this inconsistency more noticeable. Duplicating this logic increases maintenance burden and risks divergence if the deletion flow is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @fangyinc
Copilot
AI
Feb 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster deletion logic is duplicated here instead of using the existing DeleteRayClusterAndWait helper function. This is inconsistent with testLogFileEndpointDeadCluster (line 171) and testDeadClusterTasks which use the helper. Using the helper function improves code maintainability and ensures consistent deletion behavior across tests.
| // Delete RayCluster to trigger log upload to S3 | |
| err := test.Client().Ray().RayV1().RayClusters(namespace.Name).Delete(test.Ctx(), rayCluster.Name, metav1.DeleteOptions{}) | |
| g.Expect(err).NotTo(HaveOccurred()) | |
| LogWithTimestamp(test.T(), "Deleted RayCluster %s/%s", namespace.Name, rayCluster.Name) | |
| // Wait for cluster to be fully deleted (ensures logs are uploaded to S3 and events are processed) | |
| g.Eventually(func() error { | |
| _, err := GetRayCluster(test, namespace.Name, rayCluster.Name) | |
| return err | |
| }, TestTimeoutMedium).Should(WithTransform(k8serrors.IsNotFound, BeTrue())) | |
| // Delete RayCluster to trigger log upload to S3 and wait for full deletion | |
| DeleteRayClusterAndWait(test, g, namespace, rayCluster) |


Uh oh!
There was an error while loading. Please reload this page.