Fixes #38709 - Fix hosts job status stuck in "Pending" by pavanshekar · Pull Request #999 · theforeman/foreman_remote_execution

pavanshekar · 2025-09-02T17:08:19Z

Fix hosts job status remaining "Pending" after completion

Testing steps:

Navigate to Monitor -> Jobs
Click on "Run Job" button
Job category -> Katello
Job template -> Install errata by search query - Katello Script Default
Filter the hosts and enter search query
Submit
Once the job completes successfully, the status should be updated accordingly (e.g., to "Success" or "Completed") for all associated hosts mentioned in the table.

ekohl

Is there a way to write a test for this?

adamruzicka

disclaimer:
I don't recall encountering the problem this is meant to fix and I have not tested this, all I wrote here is just me trying to reason about it.

How it is supposed to work:

When a job is created it spawns a "parent" task backing the whole job, that's @job_invocation.task
This parent task spawns child sub-tasks, one for every single host in the job
The parent task suspends itself and periodically (every 15 seconds or so) checks it children tasks
When all the child tasks are done, the parent marks itself as done

Assuming the code was like this for 5+ years and we're seeing it just now as we switched to the new job invocation page, intuitively I'd look for problems there.

If any of the assumptions I just mentioned don't hold (anymore), then this change feels like putting a band aid on a much deeper problem.

adamruzicka · 2025-09-02T19:12:57Z

app/controllers/job_invocations_controller.rb

    @job_organization = Taxonomy.find_by(id: @job_invocation.task.input[:current_organization_id])
    @job_location = Taxonomy.find_by(id: @job_invocation.task.input[:current_location_id])
-    @auto_refresh = @job_invocation.task.try(:pending?)
+    @auto_refresh = @job_invocation.task.try(:pending?) || has_pending_hosts?


The has_pending_host? only comes into play if @job_invocation.task.try(:pending?) returns a false-y value. This can happen if either @job_invocation.task does not exist yet or if the task exists but is already done, which would imply all its children have already finished too.

If the @job_invocation.task does not exist yet, the child tasks cannot exist so has_pending_hosts? will be false as well.

If @job_invocation.task.pending? returns false, then all its children should already be done too, so has_pending_hosts? should always be false too.

app/controllers/job_invocations_controller.rb

adamruzicka · 2025-09-03T14:57:16Z

Needs a rebase to make the conflict go away

Copilot

Pull Request Overview

This PR fixes an issue where host job statuses remain stuck in "Pending" state after job completion by implementing a manual refresh mechanism that triggers when auto-refresh is disabled.

Adds a refresh trigger mechanism to force table data updates when jobs finish
Implements useEffect hook to detect job completion and trigger refresh
Passes refresh trigger to the host table component to update API options

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
webpack/JobInvocationDetail/index.js	Adds refresh trigger state and effect to detect job completion
webpack/JobInvocationDetail/JobInvocationHostTable.js	Accepts refresh trigger prop and updates API options to force data refresh

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-03T15:06:55Z

webpack/JobInvocationDetail/JobInvocationHostTable.js

+        params: { ...prev.params }
+      }));
+      setAllAPIOptions(prev => ({ 
+        ...prev, 
+        params: { ...prev.params }


The useEffect creates new objects with identical content ({ ...prev.params }), which may not trigger a re-fetch since the params remain the same. Consider adding a timestamp or cache-busting parameter to force the API refresh, such as params: { ...prev.params, _refresh: Date.now() }.

Suggested change

params: { ...prev.params }

}));

setAllAPIOptions(prev => ({

...prev,

params: { ...prev.params }

params: { ...prev.params, _refresh: Date.now() }

}));

setAllAPIOptions(prev => ({

...prev,

params: { ...prev.params, _refresh: Date.now() }

kmalyjur · 2025-09-09T08:23:32Z

@pavanshekar I couldn't reproduce the exact job from your testing steps, but when I test this on a job that starts in a pending state, I see the update happen in two steps after it finishes:

The chart updates correctly
The status in the table takes a few extra seconds to catch up

Is that the same behavior you were seeing? I'm looking into the code more.

pavanshekar · 2025-09-09T12:34:46Z

@pavanshekar I couldn't reproduce the exact job from your testing steps, but when I test this on a job that starts in a pending state, I see the update happen in two steps after it finishes:

The chart updates correctly

The status in the table takes a few extra seconds to catch up

Is that the same behavior you were seeing? I'm looking into the code more.

Yes, that's the behavior I am seeing! The fix addresses the issue by implementing a refreshTrigger mechanism that forces a table refresh when the job finishes - previously the table would remain stuck in "Pending" status even after successful completion because it wasn't properly refreshing the data, so now it triggers a fresh API call with a timestamp parameter to ensure the table gets the updated status immediately.

Lukshio · 2025-09-09T14:27:08Z

Little bit late to the party, but I tried to reproduce the issue, but I got same behavior as @kmalyjur . It is little bit slower - it takes like +/-10s then the chart, but it is working. Yes it's not ideal, but does it make sense to fix it by adding another watcher to speed it to "just" +/-5s

Anyway I would look for another solution then adding new prop + watcher. The code below should be the solution.
I hope this comment helps. Thank you!

// add prop autoRefresh to component, it is already used in parent
  useEffect(() => {
    setAPIOptions(prevOptions => ({ ...prevOptions }));
  }, [autoRefresh, setAPIOptions]);

Lukshio · 2025-09-09T14:29:07Z

webpack/JobInvocationDetail/JobInvocationHostTable.js

+    if (refreshTrigger > 0) {
+      setAPIOptions(prev => ({
+        ...prev,
+        params: { ...prev.params, _refresh: Date.now() },
+      }));
+      setAllAPIOptions(prev => ({
+        ...prev,
+        params: { ...prev.params, _refresh: Date.now() },
+      }));


Suggested change

if (refreshTrigger > 0) {

setAPIOptions(prev => ({

...prev,

params: { ...prev.params, _refresh: Date.now() },

}));

setAllAPIOptions(prev => ({

...prev,

params: { ...prev.params, _refresh: Date.now() },

}));

// add prop autoRefresh to component, it is already used in parent

useEffect(() => {

setAPIOptions(prevOptions => ({ ...prevOptions }));

}, [autoRefresh, setAPIOptions]);

Thank you for the suggestion! This approach is much better and cleaner. I've implemented the suggested changes by replacing the complex refreshTrigger mechanism with the simple autoRefresh useEffect that responds to the autoRefresh prop already being passed from the parent component.

adamruzicka · 2025-09-16T15:01:02Z

Without this patch (as of v16.2.1), there is a bit of a delay, but then things switch to the proper state.

Screen.Recording.2025-09-16.at.16.58.12.mov

With this patch applied, I'm getting this

Screen.Recording.2025-09-16.at.16.56.13.mov

This doesn't really seem to improve things for me

qcjames53 · 2025-09-17T13:07:39Z

@adamruzicka Thank you for the videos. I've been seeing the same thing. Thought I was going crazy! I'm going to set up a meeting with @pavanshekar to see if I'm triggering something incorrectly.

qcjames53

@pavanshekar and I just spoke about this issue. Thanks for meeting up! Here are the minimal testing steps to verify the bug:

Create several hosts on your system. 20+ is a good number. Pavan shared the following snippet to autogenerate many fake hosts in the rails console:

100.times do |i|00" if (i + 1) % 10 == 0
  host = Host.first.dup
  host.name = "fake-host-#{Time.now.to_i}-#{i.to_s.rjust(3, '0')}"
  host.mac = nil
  host.ip = nil
  host.save!
  puts "Created host #{i+1}/100" if (i + 1) % 10 == 0
end

On the all hosts page, select all hosts and click 'Schedule a job'.
Any job will work for this. An easy one is 'Commands' -> 'Run Command - Script Default' and set the command to anything.
Click through all pages and run on the selected hosts. The UI will take you to the job's webpage.
The table will show all hosts as "In progress". When refreshing the page, some of the hosts will switch to "Failed". Any fix needs to automatically update the table without page refresh.

pavanshekar · 2025-09-17T21:03:05Z

Thanks @adamruzicka for sharing the video! I updated the code where autoRefresh was checking task?.state === 'pending' but the actual task state during execution is 'running', not 'pending'. I updated it so that host statuses now update both as individual hosts complete during execution and when the job finishes by monitoring the host status changes.

qcjames53

Thank you for the help reviewing this, Pavan. The autorefresh works well on my end and seems to be reasonably efficient with the network calls. Nice job! I can confirm the host list refreshes appropriately once the job is "warmed up" but unfortunately there are some issues when the server is slow:

With enough hosts, the calls to refresh table contents start getting to take longer than the job refresh clock (seems to be 1 second). I know this adds extra work but do you think it would be possible to prevent the job invocation element to stop refreshes while waiting for the server to send back the prior request? Implementing this might even give you the next issue for free.

There are also issues with page load. The table begins updating once a second before the initial contents even have the chance to load, causing the element to flip between page load and content load animations. I would disable this autorefresh behavior entirely until the table contents are populated for the first time.

pavanshekar · 2025-09-18T13:26:52Z

Thank you for the help reviewing this, Pavan. The autorefresh works well on my end and seems to be reasonably efficient with the network calls. Nice job! I can confirm the host list refreshes appropriately once the job is "warmed up" but unfortunately there are some issues when the server is slow:

With enough hosts, the calls to refresh table contents start getting to take longer than the job refresh clock (seems to be 1 second). I know this adds extra work but do you think it would be possible to prevent the job invocation element to stop refreshes while waiting for the server to send back the prior request? Implementing this might even give you the next issue for free.

There are also issues with page load. The table begins updating once a second before the initial contents even have the chance to load, causing the element to flip between page load and content load animations. I would disable this autorefresh behavior entirely until the table contents are populated for the first time.

Thank you for the detailed feedback! I've already addressed both of these issues in the current implementation. The code now prevents request overlapping when the server is slow and also disables autorefresh until the initial table contents are loaded.

qcjames53

I feel bad to keep requesting changes but there are still scenarios where this autoupdate slams the server with dozens of requests at once. I'm seeing this when I am waiting on one job to complete and start another, making all of the hosts stay in "pending" for a while. The requests named "7" are the ones driving the autoupdate and they pretty much DOS attack a single-threaded server.

qcjames53 · 2025-09-19T13:58:58Z

Oh but other that that everything works really well! It's just I can't ACK with this page DOSing slow single threaded servers.

pavanshekar · 2025-09-19T21:02:25Z

I feel bad to keep requesting changes but there are still scenarios where this autoupdate slams the server with dozens of requests at once. I'm seeing this when I am waiting on one job to complete and start another, making all of the hosts stay in "pending" for a while. The requests named "7" are the ones driving the autoupdate and they pretty much DOS attack a single-threaded server.

I've addressed the DOS issue by implementing a throttled refresh mechanism that only calls the hosts API when host status actually changes (when individual hosts complete) and limits these calls to a maximum of once every 3 seconds.

ekohl · 2025-09-20T19:25:38Z

AflFAIK today we don't use it but we may want to consider https://guides.rubyonrails.org/action_cable_overview.html to get updates. Rather than polling you can subscribe to changes. Perhaps an improvement we can make? For example, it could get all job statuses and then subscribe to all jobs for updates. No ddos in terms of requests.

I don't want to push it, but in the past we first didn't use it because it didn't exist. Then we used mod_passenger which didn't support it. Now we can use it but don't. I'm not sure if there's a good reason or not.

qcjames53

Thanks for these changes Pavan; the page itself runs great now no matter what I do! Huge improvement! It looks like there are some minor changes requested from the JS test cases but after you patch those up I'm okay to ACK.

Ewoud's suggestion is interesting. Let's talk about it in standup.

qcjames53

Thanks for talking this over. Everything is looking good to me! ACKed

adamruzicka · 2025-09-23T07:52:08Z

We still need those tests to pass though

Lukshio · 2025-09-23T08:12:22Z

Hi everyone, meanwhile my PR about API calls refactor got merged. @pavanshekar Can we retest it with latest version? In my case it looks like the issue disappeared. Also if the issue still occurs, it will definitely affect this code, because the API call logic has changed.

pavanshekar · 2025-09-24T18:04:08Z

Hi everyone, meanwhile my PR about API calls refactor got merged. @pavanshekar Can we retest it with latest version? In my case it looks like the issue disappeared. Also if the issue still occurs, it will definitely affect this code, because the API call logic has changed.

Thanks for confirming! I retested with the latest changes and the issue seems to be resolved. Since the original issue was about hosts remaining in “Pending” status even after a job completed successfully, and the expected behavior was that hosts should update to “Success” or “Completed,” it looks like @Lukshio s’ refactor addresses this. I’ll go ahead and close this PR.

pr-processor bot added Not yet reviewed Waiting on contributor labels Sep 2, 2025

ekohl reviewed Sep 2, 2025

View reviewed changes

pr-processor bot removed the Not yet reviewed label Sep 2, 2025

adamruzicka reviewed Sep 2, 2025

View reviewed changes

pavanshekar force-pushed the issue-38709 branch from 7a60951 to 9b7a848 Compare September 3, 2025 14:52

pr-processor bot added Needs re-review and removed Waiting on contributor Needs re-review labels Sep 3, 2025

adamruzicka requested a review from Copilot September 3, 2025 14:58

Copilot AI reviewed Sep 3, 2025

View reviewed changes

pavanshekar force-pushed the issue-38709 branch 3 times, most recently from 5afa8c0 to c9f5add Compare September 4, 2025 12:17

Lukshio reviewed Sep 9, 2025

View reviewed changes

pavanshekar force-pushed the issue-38709 branch 2 times, most recently from 447c0f1 to 02b18e8 Compare September 9, 2025 17:48

qcjames53 reviewed Sep 17, 2025

View reviewed changes

pavanshekar force-pushed the issue-38709 branch from 02b18e8 to b3024a6 Compare September 17, 2025 20:56

qcjames53 suggested changes Sep 17, 2025

View reviewed changes

pr-processor bot added the Waiting on contributor label Sep 17, 2025

pavanshekar force-pushed the issue-38709 branch from b3024a6 to 27b2e61 Compare September 18, 2025 13:26

pr-processor bot removed the Waiting on contributor label Sep 18, 2025

pr-processor bot added the Needs re-review label Sep 18, 2025

qcjames53 suggested changes Sep 19, 2025

View reviewed changes

pr-processor bot added Waiting on contributor and removed Needs re-review labels Sep 19, 2025

Fixes #38709 - Fix hosts job status stuck in "Pending"

7d036d6

pavanshekar force-pushed the issue-38709 branch from 27b2e61 to 7d036d6 Compare September 19, 2025 21:01

pr-processor bot added Needs re-review and removed Waiting on contributor labels Sep 19, 2025

This comment was marked as duplicate.

Sign in to view

qcjames53 suggested changes Sep 22, 2025

View reviewed changes

pr-processor bot added Waiting on contributor and removed Needs re-review labels Sep 22, 2025

qcjames53 approved these changes Sep 22, 2025

View reviewed changes

pavanshekar closed this Sep 24, 2025

Conversation

pavanshekar commented Sep 2, 2025

Uh oh!

ekohl left a comment

Choose a reason for hiding this comment

Uh oh!

adamruzicka left a comment

Choose a reason for hiding this comment

Uh oh!

adamruzicka Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adamruzicka commented Sep 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

kmalyjur commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavanshekar commented Sep 9, 2025

Uh oh!

Lukshio commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lukshio Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavanshekar Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

adamruzicka commented Sep 16, 2025

Uh oh!

qcjames53 commented Sep 17, 2025

Uh oh!

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

pavanshekar commented Sep 17, 2025

Uh oh!

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

pavanshekar commented Sep 18, 2025

Uh oh!

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

qcjames53 commented Sep 19, 2025

Uh oh!

pavanshekar commented Sep 19, 2025

Uh oh!

ekohl commented Sep 20, 2025

Uh oh!

This comment was marked as duplicate.

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

adamruzicka commented Sep 23, 2025

Uh oh!

Lukshio commented Sep 23, 2025

Uh oh!

pavanshekar commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

kmalyjur commented Sep 9, 2025 •

edited

Loading

Lukshio commented Sep 9, 2025 •

edited

Loading

Lukshio Sep 9, 2025 •

edited

Loading

pavanshekar commented Sep 24, 2025 •

edited

Loading