Fixes #38709 - Fix hosts job status stuck in "Pending"#999
Fixes #38709 - Fix hosts job status stuck in "Pending"#999pavanshekar wants to merge 1 commit intotheforeman:masterfrom
Conversation
ekohl
left a comment
There was a problem hiding this comment.
Is there a way to write a test for this?
adamruzicka
left a comment
There was a problem hiding this comment.
disclaimer:
I don't recall encountering the problem this is meant to fix and I have not tested this, all I wrote here is just me trying to reason about it.
How it is supposed to work:
- When a job is created it spawns a "parent" task backing the whole job, that's
@job_invocation.task - This parent task spawns child sub-tasks, one for every single host in the job
- The parent task suspends itself and periodically (every 15 seconds or so) checks it children tasks
- When all the child tasks are done, the parent marks itself as done
Assuming the code was like this for 5+ years and we're seeing it just now as we switched to the new job invocation page, intuitively I'd look for problems there.
If any of the assumptions I just mentioned don't hold (anymore), then this change feels like putting a band aid on a much deeper problem.
| @job_organization = Taxonomy.find_by(id: @job_invocation.task.input[:current_organization_id]) | ||
| @job_location = Taxonomy.find_by(id: @job_invocation.task.input[:current_location_id]) | ||
| @auto_refresh = @job_invocation.task.try(:pending?) | ||
| @auto_refresh = @job_invocation.task.try(:pending?) || has_pending_hosts? |
There was a problem hiding this comment.
The has_pending_host? only comes into play if @job_invocation.task.try(:pending?) returns a false-y value. This can happen if either @job_invocation.task does not exist yet or if the task exists but is already done, which would imply all its children have already finished too.
If the @job_invocation.task does not exist yet, the child tasks cannot exist so has_pending_hosts? will be false as well.
If @job_invocation.task.pending? returns false, then all its children should already be done too, so has_pending_hosts? should always be false too.
7a60951 to
9b7a848
Compare
|
Needs a rebase to make the conflict go away |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an issue where host job statuses remain stuck in "Pending" state after job completion by implementing a manual refresh mechanism that triggers when auto-refresh is disabled.
- Adds a refresh trigger mechanism to force table data updates when jobs finish
- Implements useEffect hook to detect job completion and trigger refresh
- Passes refresh trigger to the host table component to update API options
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| webpack/JobInvocationDetail/index.js | Adds refresh trigger state and effect to detect job completion |
| webpack/JobInvocationDetail/JobInvocationHostTable.js | Accepts refresh trigger prop and updates API options to force data refresh |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| params: { ...prev.params } | ||
| })); | ||
| setAllAPIOptions(prev => ({ | ||
| ...prev, | ||
| params: { ...prev.params } |
There was a problem hiding this comment.
The useEffect creates new objects with identical content ({ ...prev.params }), which may not trigger a re-fetch since the params remain the same. Consider adding a timestamp or cache-busting parameter to force the API refresh, such as params: { ...prev.params, _refresh: Date.now() }.
| params: { ...prev.params } | |
| })); | |
| setAllAPIOptions(prev => ({ | |
| ...prev, | |
| params: { ...prev.params } | |
| params: { ...prev.params, _refresh: Date.now() } | |
| })); | |
| setAllAPIOptions(prev => ({ | |
| ...prev, | |
| params: { ...prev.params, _refresh: Date.now() } |
5afa8c0 to
c9f5add
Compare
|
@pavanshekar I couldn't reproduce the exact job from your testing steps, but when I test this on a job that starts in a pending state, I see the update happen in two steps after it finishes:
Is that the same behavior you were seeing? I'm looking into the code more. |
Yes, that's the behavior I am seeing! The fix addresses the issue by implementing a refreshTrigger mechanism that forces a table refresh when the job finishes - previously the table would remain stuck in "Pending" status even after successful completion because it wasn't properly refreshing the data, so now it triggers a fresh API call with a timestamp parameter to ensure the table gets the updated status immediately. |
|
Little bit late to the party, but I tried to reproduce the issue, but I got same behavior as @kmalyjur . It is little bit slower - it takes like +/-10s then the chart, but it is working. Yes it's not ideal, but does it make sense to fix it by adding another watcher to speed it to "just" +/-5s Anyway I would look for another solution then adding new prop + watcher. The code below should be the solution. |
| if (refreshTrigger > 0) { | ||
| setAPIOptions(prev => ({ | ||
| ...prev, | ||
| params: { ...prev.params, _refresh: Date.now() }, | ||
| })); | ||
| setAllAPIOptions(prev => ({ | ||
| ...prev, | ||
| params: { ...prev.params, _refresh: Date.now() }, | ||
| })); |
There was a problem hiding this comment.
| if (refreshTrigger > 0) { | |
| setAPIOptions(prev => ({ | |
| ...prev, | |
| params: { ...prev.params, _refresh: Date.now() }, | |
| })); | |
| setAllAPIOptions(prev => ({ | |
| ...prev, | |
| params: { ...prev.params, _refresh: Date.now() }, | |
| })); | |
| // add prop autoRefresh to component, it is already used in parent | |
| useEffect(() => { | |
| setAPIOptions(prevOptions => ({ ...prevOptions })); | |
| }, [autoRefresh, setAPIOptions]); |
There was a problem hiding this comment.
Thank you for the suggestion! This approach is much better and cleaner. I've implemented the suggested changes by replacing the complex refreshTrigger mechanism with the simple autoRefresh useEffect that responds to the autoRefresh prop already being passed from the parent component.
447c0f1 to
02b18e8
Compare
|
Without this patch (as of v16.2.1), there is a bit of a delay, but then things switch to the proper state. Screen.Recording.2025-09-16.at.16.58.12.movWith this patch applied, I'm getting this Screen.Recording.2025-09-16.at.16.56.13.movThis doesn't really seem to improve things for me |
|
@adamruzicka Thank you for the videos. I've been seeing the same thing. Thought I was going crazy! I'm going to set up a meeting with @pavanshekar to see if I'm triggering something incorrectly. |
qcjames53
left a comment
There was a problem hiding this comment.
@pavanshekar and I just spoke about this issue. Thanks for meeting up! Here are the minimal testing steps to verify the bug:
- Create several hosts on your system. 20+ is a good number. Pavan shared the following snippet to autogenerate many fake hosts in the rails console:
100.times do |i|00" if (i + 1) % 10 == 0
host = Host.first.dup
host.name = "fake-host-#{Time.now.to_i}-#{i.to_s.rjust(3, '0')}"
host.mac = nil
host.ip = nil
host.save!
puts "Created host #{i+1}/100" if (i + 1) % 10 == 0
end
- On the all hosts page, select all hosts and click 'Schedule a job'.
- Any job will work for this. An easy one is 'Commands' -> 'Run Command - Script Default' and set the command to anything.
- Click through all pages and run on the selected hosts. The UI will take you to the job's webpage.
- The table will show all hosts as "In progress". When refreshing the page, some of the hosts will switch to "Failed". Any fix needs to automatically update the table without page refresh.
02b18e8 to
b3024a6
Compare
|
Thanks @adamruzicka for sharing the video! I updated the code where autoRefresh was checking task?.state === 'pending' but the actual task state during execution is 'running', not 'pending'. I updated it so that host statuses now update both as individual hosts complete during execution and when the job finishes by monitoring the host status changes. |
qcjames53
left a comment
There was a problem hiding this comment.
Thank you for the help reviewing this, Pavan. The autorefresh works well on my end and seems to be reasonably efficient with the network calls. Nice job! I can confirm the host list refreshes appropriately once the job is "warmed up" but unfortunately there are some issues when the server is slow:
With enough hosts, the calls to refresh table contents start getting to take longer than the job refresh clock (seems to be 1 second). I know this adds extra work but do you think it would be possible to prevent the job invocation element to stop refreshes while waiting for the server to send back the prior request? Implementing this might even give you the next issue for free.
There are also issues with page load. The table begins updating once a second before the initial contents even have the chance to load, causing the element to flip between page load and content load animations. I would disable this autorefresh behavior entirely until the table contents are populated for the first time.
b3024a6 to
27b2e61
Compare
Thank you for the detailed feedback! I've already addressed both of these issues in the current implementation. The code now prevents request overlapping when the server is slow and also disables autorefresh until the initial table contents are loaded. |
qcjames53
left a comment
There was a problem hiding this comment.
I feel bad to keep requesting changes but there are still scenarios where this autoupdate slams the server with dozens of requests at once. I'm seeing this when I am waiting on one job to complete and start another, making all of the hosts stay in "pending" for a while. The requests named "7" are the ones driving the autoupdate and they pretty much DOS attack a single-threaded server.

|
Oh but other that that everything works really well! It's just I can't ACK with this page DOSing slow single threaded servers. |
27b2e61 to
7d036d6
Compare
|
AflFAIK today we don't use it but we may want to consider https://guides.rubyonrails.org/action_cable_overview.html to get updates. Rather than polling you can subscribe to changes. Perhaps an improvement we can make? For example, it could get all job statuses and then subscribe to all jobs for updates. No ddos in terms of requests. I don't want to push it, but in the past we first didn't use it because it didn't exist. Then we used mod_passenger which didn't support it. Now we can use it but don't. I'm not sure if there's a good reason or not. |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
qcjames53
left a comment
There was a problem hiding this comment.
Thanks for these changes Pavan; the page itself runs great now no matter what I do! Huge improvement! It looks like there are some minor changes requested from the JS test cases but after you patch those up I'm okay to ACK.
Ewoud's suggestion is interesting. Let's talk about it in standup.
qcjames53
left a comment
There was a problem hiding this comment.
Thanks for talking this over. Everything is looking good to me! ACKed
|
We still need those tests to pass though |
|
Hi everyone, meanwhile my PR about API calls refactor got merged. @pavanshekar Can we retest it with latest version? In my case it looks like the issue disappeared. Also if the issue still occurs, it will definitely affect this code, because the API call logic has changed. |
Thanks for confirming! I retested with the latest changes and the issue seems to be resolved. Since the original issue was about hosts remaining in “Pending” status even after a job completed successfully, and the expected behavior was that hosts should update to “Success” or “Completed,” it looks like @Lukshio s’ refactor addresses this. I’ll go ahead and close this PR. |

Fix hosts job status remaining "Pending" after completion
Testing steps: