Skip to content

fix: handle stuck Job and AutoUpdateJob records in fix_statuses script #182#183

Merged
Fedir-Yatsenko merged 1 commit intodevelopmentfrom
fix/182-improve-fix_statuses-script
Mar 5, 2026
Merged

fix: handle stuck Job and AutoUpdateJob records in fix_statuses script #182#183
Fedir-Yatsenko merged 1 commit intodevelopmentfrom
fix/182-improve-fix_statuses-script

Conversation

@Fedir-Yatsenko
Copy link
Collaborator

@Fedir-Yatsenko Fedir-Yatsenko commented Mar 5, 2026

Applicable issues

Description of changes

Extract generic _set_failed_status method and add support for resetting stuck Job and AutoUpdateJob records alongside ChannelDatasetVersion. Each model is fixed in a separate try-except block and database session to ensure one failure doesn't block the others.

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

…#182

Extract generic _set_failed_status method and add support for resetting
stuck Job and AutoUpdateJob records alongside ChannelDatasetVersion.
Each model is fixed in a separate try-except block and database session
to ensure one failure doesn't block the others.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Fedir-Yatsenko Fedir-Yatsenko requested a review from kryachkow March 5, 2026 14:20
@Fedir-Yatsenko Fedir-Yatsenko self-assigned this Mar 5, 2026
@Fedir-Yatsenko Fedir-Yatsenko requested a review from ypldan as a code owner March 5, 2026 14:20
@Fedir-Yatsenko Fedir-Yatsenko added the bug Something isn't working label Mar 5, 2026
@Fedir-Yatsenko Fedir-Yatsenko linked an issue Mar 5, 2026 that may be closed by this pull request
@Fedir-Yatsenko
Copy link
Collaborator Author

Fedir-Yatsenko commented Mar 5, 2026

/deploy-review

GitHub actions run: 22722272243
Environment URL: review-environment | pipeline

@Fedir-Yatsenko
Copy link
Collaborator Author

Fedir-Yatsenko commented Mar 5, 2026

Examples of script logs:

NOTE: Tested locally with docker-compose.

ADMIN_MODE = 'FIX_STATUSES'
2026-03-05T14:17:31.811414686Z INFO: 2026-03-05 14:17:31 Starting fix_statuses script...
2026-03-05T14:17:31.813623760Z INFO: 2026-03-05 14:17:31 Attempting to create default engine (attempt 1/5)
2026-03-05T14:17:31.921273969Z INFO: 2026-03-05 14:17:31 default engine created and connection verified
2026-03-05T14:17:31.921567380Z INFO: 2026-03-05 14:17:31 Setting FAILED status for all non-completed channel_dataset_versions...
2026-03-05T14:17:32.003269249Z INFO: 2026-03-05 14:17:32 Updated 0 channel_dataset_versions record(s) to FAILED status
2026-03-05T14:17:32.004692861Z INFO: 2026-03-05 14:17:32 Setting FAILED status for all non-completed jobs...
2026-03-05T14:17:32.014431495Z INFO: 2026-03-05 14:17:32 Updated 2 jobs record(s) to FAILED status
2026-03-05T14:17:32.019212734Z INFO: 2026-03-05 14:17:32 Setting FAILED status for all non-completed auto_update_jobs...
2026-03-05T14:17:32.029518721Z INFO: 2026-03-05 14:17:32 Updated 2 auto_update_jobs record(s) to FAILED status
2026-03-05T14:17:32.034903302Z INFO: 2026-03-05 14:17:32 fix_statuses script completed successfully

@Fedir-Yatsenko
Copy link
Collaborator Author

Examples of script logs:

NOTE: Tested in the Review environment.

INFO: | 2026-03-05 14:44:28 | 19 | main | fix_statuses script completed successfully
2026-03-05 14:44:28.556
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Updated 0 auto_update_jobs record(s) to FAILED status
2026-03-05 14:44:28.551
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Setting FAILED status for all non-completed auto_update_jobs...
2026-03-05 14:44:28.550
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Updated 0 jobs record(s) to FAILED status
2026-03-05 14:44:28.543
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Setting FAILED status for all non-completed jobs...
2026-03-05 14:44:28.542
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Updated 0 channel_dataset_versions record(s) to FAILED status
2026-03-05 14:44:28.469
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.admin.services.dataset | Setting FAILED status for all non-completed channel_dataset_versions...
2026-03-05 14:44:28.469
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.common.models.database | default engine created and connection verified
2026-03-05 14:44:28.338
INFO: | 2026-03-05 14:44:28 | 19 | statgpt.common.models.database | Attempting to create default engine (attempt 1/5)
2026-03-05 14:44:28.336
INFO: | 2026-03-05 14:44:28 | 19 | main | Starting fix_statuses script...
2026-03-05 14:44:18.048
INFO [alembic.runtime.migration] Will assume transactional DDL.
2026-03-05 14:44:18.048
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
2026-03-05 14:44:10.702
ADMIN_MODE = 'INIT'

@Fedir-Yatsenko Fedir-Yatsenko merged commit 4458c1f into development Mar 5, 2026
11 checks passed
@Fedir-Yatsenko Fedir-Yatsenko deleted the fix/182-improve-fix_statuses-script branch March 5, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix_statuses script does not handle AutoUpdateJob and Job records

2 participants