Skip to content

Comments

Testing: Management command for e2e test of ML jobs#1143

Open
carlosgjs wants to merge 14 commits intoRolnickLab:mainfrom
uw-ssec:carlos/testmljob
Open

Testing: Management command for e2e test of ML jobs#1143
carlosgjs wants to merge 14 commits intoRolnickLab:mainfrom
uw-ssec:carlos/testmljob

Conversation

@carlosgjs
Copy link
Collaborator

@carlosgjs carlosgjs commented Feb 20, 2026

This pull request introduces a new management command for end-to-end testing of ML jobs in the Django project. The command allows users to create and monitor ML jobs using specified collections and pipelines, providing real-time progress updates and final results in the console.

New ML job testing command:

  • Added test_ml_job_e2e.py management command for end-to-end testing of ML job processing, including job creation, progress monitoring, and final results display.
  • Supports selection of SourceImageCollection, Pipeline, and JobDispatchMode via command-line arguments, with informative error handling for missing resources.
  • Provides real-time progress updates and displays final job results, including stage progress and parameters, in a user-friendly format.

Summary by CodeRabbit

  • New Features
    • Added an end-to-end ML job test command that runs and enqueues jobs by collection/pipeline and dispatch mode.
    • Provides real-time, single-line progress updates with per-stage progress and elapsed time.
    • Handles interruptions gracefully (preserves job execution and surfaces the Job ID).
    • Prints a final summary with status, total runtime, and per-stage results.

@netlify
Copy link

netlify bot commented Feb 20, 2026

👷 Deploy request for antenna-ssec pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit f72a552

@netlify
Copy link

netlify bot commented Feb 20, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit f72a552
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/699deff64dc69d00085e9cd4

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Adds a new Django management command to create, enqueue, and monitor an ML Job end-to-end by locating a SourceImageCollection and Pipeline, creating a Job with a dispatch mode, polling status every 2 seconds, and printing live progress and final results until completion or interruption.

Changes

Cohort / File(s) Summary
New ML Job E2E Command
ami/jobs/management/commands/test_ml_job_e2e.py
Adds a new management command class Command (subclassing BaseCommand) that: validates SourceImageCollection and Pipeline existence, creates and saves a Job with a dispatch mode, enqueues the job, monitors progress in a loop (2s poll), updates a compact single-line progress display, handles KeyboardInterrupt to stop monitoring while leaving the job running, and prints final results and Job ID.

Sequence Diagram

sequenceDiagram
    actor User
    participant CMD as test_ml_job_e2e Command
    participant DB as Database
    participant JobSvc as Job Creation/Model
    participant Queue as Dispatcher/Queue
    participant Monitor as Monitor Loop
    participant Out as Console Output

    User->>CMD: run with collection, pipeline, dispatch args
    CMD->>DB: query SourceImageCollection
    alt not found
        CMD->>Out: print error & exit
    end
    CMD->>DB: query Pipeline
    alt not found
        CMD->>Out: print error & exit
    end
    CMD->>JobSvc: create/save Job (with dispatch mode)
    JobSvc->>DB: persist Job
    CMD->>Queue: enqueue Job
    CMD->>Monitor: start _monitor_job(job, start_time)

    loop every 2s until final state or interrupt
        Monitor->>DB: poll job status & stages
        Monitor->>Out: update single-line progress (elapsed, status, per-stage)
        alt final state reached
            Monitor->>Out: call _show_final_results
            Monitor->>Monitor: exit loop
        end
    end

    alt KeyboardInterrupt
        Monitor->>Out: print Job ID and stop monitoring
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • mihow

Poem

🐰 I hopped in with code to test the run,

Queued a job beneath the sun.
Polling beats like little drum,
Stages tick until it's done.
Hop, report — the job's begun!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the summary and key changes, but is missing several required template sections: Related Issues, Detailed Description, How to Test, Screenshots, Deployment Notes, and Checklist. Add missing sections from the template: reference any related issues, provide detailed description with potential side effects, include testing instructions, and complete the checklist.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: a new management command for end-to-end testing of ML jobs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@carlosgjs carlosgjs requested a review from mihow February 20, 2026 19:57
@carlosgjs carlosgjs marked this pull request as ready for review February 20, 2026 19:57
Copilot AI review requested due to automatic review settings February 20, 2026 19:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
ami/jobs/management/commands/test_ml_job_e2e.py (2)

40-44: Same from None chaining needed here.

Per Ruff B904, chain the re-raised CommandError to suppress the noisy inner traceback.

Proposed fix
         except Pipeline.DoesNotExist:
-            raise CommandError(f"Pipeline '{pipeline_slug}' not found")
+            raise CommandError(f"Pipeline '{pipeline_slug}' not found") from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 40 - 44, In the
try/except that looks up Pipeline via Pipeline.objects.get(slug=pipeline_slug),
update the exception re-raise to suppress the inner traceback by chaining to
None: replace the current "raise CommandError(f\"Pipeline '{pipeline_slug}' not
found\")" with raising CommandError from None (i.e., raise CommandError(...)
from None) so the caught Pipeline.DoesNotExist does not emit a noisy inner
traceback when CommandError is thrown.

71-85: Consider adding an optional timeout to prevent indefinite monitoring.

For an E2E test utility, an unbounded polling loop is a minor concern (Ctrl+C works), but a --timeout argument would make this more robust for CI/scripted usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 71 - 85, Add an
optional timeout to the E2E job monitor so it cannot poll forever: update the
_monitor_job method to accept a timeout (e.g., timeout=None) and while looping
check if elapsed >= timeout (when timeout is set) and abort/raise a clear
TimeoutError or mark the job as failed and call _show_final_results; wire this
timeout into the management command by adding a --timeout CLI option in the
command class (parse as seconds, default None) and pass it into _monitor_job
when invoked so CI or scripts can limit wait time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/jobs/management/commands/test_ml_job_e2e.py`:
- Around line 123-125: The loop that writes stage parameters skips falsy but
valid values because it tests "if param.value:"; change that check to explicitly
test for None so values like 0, False or empty string are preserved — update the
conditional in the loop iterating over stage.params (the block that calls
self.stdout.write(f"    {param.name}: {param.value}")) to use "if param.value is
not None:" so only absent values are skipped.
- Around line 109-114: The three self.stdout.write calls that log job outcomes
(the branches checking JobState.SUCCESS, JobState.FAILURE, and the else branch
using job.status) use f-strings with no interpolation; remove the extraneous
leading "f" from those string literals in the block where job.status is
evaluated so they become normal strings (e.g., change f"✅ Job completed
successfully" etc.); update the three occurrences in the function/command where
these self.stdout.write(...) calls appear (test_ml_job_e2e.py) to silence Ruff
F541.
- Around line 32-37: The current use of
SourceImageCollection.objects.get(name=collection_name) can raise
MultipleObjectsReturned; modify the lookup to handle duplicates by either
catching SourceImageCollection.MultipleObjectsReturned (or
MultipleObjectsReturned from django.core.exceptions) and raising a CommandError
chaining the original exception (raise CommandError(...) from e) with a clear
message, or replace .get(...) with
SourceImageCollection.objects.filter(name=collection_name).first() and then
raise CommandError if result is None or if you detect multiple matches and want
to surface that as an error; update the block around
SourceImageCollection.objects.get, the except path, and ensure you chain the
original exception when re-raising CommandError.

---

Nitpick comments:
In `@ami/jobs/management/commands/test_ml_job_e2e.py`:
- Around line 40-44: In the try/except that looks up Pipeline via
Pipeline.objects.get(slug=pipeline_slug), update the exception re-raise to
suppress the inner traceback by chaining to None: replace the current "raise
CommandError(f\"Pipeline '{pipeline_slug}' not found\")" with raising
CommandError from None (i.e., raise CommandError(...) from None) so the caught
Pipeline.DoesNotExist does not emit a noisy inner traceback when CommandError is
thrown.
- Around line 71-85: Add an optional timeout to the E2E job monitor so it cannot
poll forever: update the _monitor_job method to accept a timeout (e.g.,
timeout=None) and while looping check if elapsed >= timeout (when timeout is
set) and abort/raise a clear TimeoutError or mark the job as failed and call
_show_final_results; wire this timeout into the management command by adding a
--timeout CLI option in the command class (parse as seconds, default None) and
pass it into _monitor_job when invoked so CI or scripts can limit wait time.

Comment on lines +32 to +37
try:
collection = SourceImageCollection.objects.get(name=collection_name)
self.stdout.write(self.style.SUCCESS(f"✓ Found collection: {collection.name}"))
self.stdout.write(f" Project: {collection.project.name}")
except SourceImageCollection.DoesNotExist:
raise CommandError(f"SourceImageCollection '{collection_name}' not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SourceImageCollection.objects.get(name=...) may raise MultipleObjectsReturned.

The name field on SourceImageCollection is not unique. If multiple collections share the same name, this will throw an unhandled MultipleObjectsReturned exception with a raw traceback. Either catch MultipleObjectsReturned or use .filter(...).first() with an appropriate message.

Also, per static analysis (B904), chain the original exception for better diagnostics.

Proposed fix
         try:
             collection = SourceImageCollection.objects.get(name=collection_name)
             self.stdout.write(self.style.SUCCESS(f"✓ Found collection: {collection.name}"))
             self.stdout.write(f"  Project: {collection.project.name}")
         except SourceImageCollection.DoesNotExist:
-            raise CommandError(f"SourceImageCollection '{collection_name}' not found")
+            raise CommandError(f"SourceImageCollection '{collection_name}' not found") from None
+        except SourceImageCollection.MultipleObjectsReturned:
+            raise CommandError(
+                f"Multiple SourceImageCollections found with name '{collection_name}'. Use a unique name."
+            ) from None
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 37-37: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 37-37: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 32 - 37, The
current use of SourceImageCollection.objects.get(name=collection_name) can raise
MultipleObjectsReturned; modify the lookup to handle duplicates by either
catching SourceImageCollection.MultipleObjectsReturned (or
MultipleObjectsReturned from django.core.exceptions) and raising a CommandError
chaining the original exception (raise CommandError(...) from e) with a clear
message, or replace .get(...) with
SourceImageCollection.objects.filter(name=collection_name).first() and then
raise CommandError if result is None or if you detect multiple matches and want
to surface that as an error; update the block around
SourceImageCollection.objects.get, the except path, and ensure you chain the
original exception when re-raising CommandError.

Comment on lines 123 to 125
for param in stage.params:
if param.value:
self.stdout.write(f" {param.name}: {param.value}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

if param.value: silently hides falsy values like 0, False, or "".

If a stage parameter can legitimately hold 0 or another falsy value, it will be skipped. Use if param.value is not None: to only skip truly absent values.

Proposed fix
                 for param in stage.params:
-                    if param.value:
+                    if param.value is not None:
                         self.stdout.write(f"    {param.name}: {param.value}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for param in stage.params:
if param.value:
self.stdout.write(f" {param.name}: {param.value}")
for param in stage.params:
if param.value is not None:
self.stdout.write(f" {param.name}: {param.value}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 123 - 125, The
loop that writes stage parameters skips falsy but valid values because it tests
"if param.value:"; change that check to explicitly test for None so values like
0, False or empty string are preserved — update the conditional in the loop
iterating over stage.params (the block that calls self.stdout.write(f"   
{param.name}: {param.value}")) to use "if param.value is not None:" so only
absent values are skipped.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new Django management command test_ml_job_e2e for end-to-end testing of ML job processing. The command creates ML jobs with specified collections and pipelines, monitors their progress in real-time, and displays final results in the console. This is a useful debugging and testing tool for developers working with the ML job pipeline.

Changes:

  • Added test_ml_job_e2e.py management command with support for creating and monitoring ML jobs through different dispatch modes (internal, sync_api, async_api)
  • Implements real-time progress monitoring with console updates every 2 seconds showing job status and per-stage progress
  • Provides final results summary including stage completion status and parameter values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@carlosgjs carlosgjs changed the title Carlos/testmljob Testing: Management command for e2e test of ML jobs Feb 20, 2026
mihow and others added 6 commits February 24, 2026 09:34
* fix: prevent NATS connection flooding and stale job task fetching

- Add connect_timeout=5, allow_reconnect=False to NATS connections to
  prevent leaked reconnection loops from blocking Django's event loop
- Guard /tasks endpoint against terminal-status jobs (return empty tasks
  instead of attempting NATS reserve)
- IncompleteJobFilter now excludes jobs by top-level status in addition
  to progress JSON stages
- Add stale worker cleanup to integration test script

Found during PSv2 integration testing where stale ADC workers with
default DataLoader parallelism overwhelmed the single uvicorn worker
thread by flooding /tasks with concurrent NATS reserve requests.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: PSv2 integration test session notes and NATS flooding findings

Session notes from 2026-02-16 integration test including root cause
analysis of stale worker task competition and NATS connection issues.
Findings doc tracks applied fixes and remaining TODOs with priorities.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: update session notes with successful test run #3

PSv2 integration test passed end-to-end (job 1380, 20/20 images).
Identified ack_wait=300s as cause of ~5min idle time when GPU
processes race for NATS tasks.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: batch NATS task fetch to prevent HTTP timeouts

Replace N×1 reserve_task() calls with single reserve_tasks() batch
fetch. The previous implementation created a new pull subscription per
message (320 NATS round trips for batch=64), causing the /tasks endpoint
to exceed HTTP client timeouts. The new approach uses one psub.fetch()
call for the entire batch.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: add next session prompt

* feat: add pipeline__slug__in filter for multi-pipeline job queries

Workers that handle multiple pipelines can now fetch jobs for all of them
in a single request: ?pipeline__slug__in=slug1,slug2

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: remove local-only docs and scripts from branch

These files are session notes, planning docs, and test scripts that
should stay local rather than be part of the PR.

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: set job dispatch_mode at creation time based on project feature flags

ML jobs with a pipeline now get dispatch_mode set during setup() instead
of waiting until run() is called by the Celery worker. This lets the UI
show the correct mode immediately after job creation.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: add timeouts to all JetStream operations and restore reconnect policy

Add NATS_JETSTREAM_TIMEOUT (10s) to all JetStream metadata operations
via asyncio.wait_for() so a hung NATS connection fails fast instead of
blocking the caller's thread indefinitely. Also restore the intended
reconnect policy (2 attempts, 1s wait) that was lost in a prior force push.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: propagate NATS timeouts as 503 instead of swallowing them

asyncio.TimeoutError from _ensure_stream() and _ensure_consumer() was
caught by the broad `except Exception` in reserve_tasks(), silently
returning [] and making NATS outages indistinguishable from empty queues.
Workers would then poll immediately, recreating the flooding problem.

- Add explicit `except asyncio.TimeoutError: raise` in reserve_tasks()
- Catch TimeoutError and OSError in the /tasks view, return 503
- Restore allow_reconnect=False (fail-fast on connection issues)
- Add return type annotation to get_connection()

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: address review comments (log level, fetch timeout, docstring)

- Downgrade reserve_tasks log to DEBUG when zero tasks reserved (avoid
  log spam from frequent polling)
- Pass timeout=0.5 from /tasks endpoint to avoid blocking the worker
  for 5s on empty queues
- Fix docstring examples using string 'job123' for int-typed job_id

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: catch nats.errors.Error in /tasks endpoint for proper 503 responses

NoServersError, ConnectionClosedError, and other NATS exceptions inherit
from nats.errors.Error (not OSError), so they escaped the handler and
returned 500 instead of 503.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
…olnickLab#1142)

* feat: configurable NATS tuning and gunicorn worker management

Rebase onto main after RolnickLab#1135 merge. Keep only the additions unique to
this branch:

- Make TASK_TTR configurable via NATS_TASK_TTR Django setting (default 30s)
- Make max_ack_pending configurable via NATS_MAX_ACK_PENDING setting (default 100)
- Local dev: switch to gunicorn+UvicornWorker by default for production
  parity, with USE_UVICORN=1 escape hatch for raw uvicorn
- Production: auto-detect WEB_CONCURRENCY from CPU cores (capped at 8)
  when not explicitly set in the environment

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: address PR review comments

- Fix max_ack_pending falsy-zero guard (use `is not None` instead of `or`)
- Update TaskQueueManager docstring with Args section
- Simplify production WEB_CONCURRENCY fallback (just use nproc)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Michael Bunsen <notbot@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
* fix: include pipeline_slug in MinimalJobSerializer (ids_only response)

The ADC worker fetches jobs with ids_only=1 and expects pipeline_slug in
the response to know which pipeline to run. Without it, Pydantic
validation fails and the worker skips the job.

Co-Authored-By: Claude <noreply@anthropic.com>

* Update ami/jobs/serializers.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
ami/jobs/management/commands/test_ml_job_e2e.py (2)

32-37: ⚠️ Potential issue | 🟡 Minor

Handle duplicate collection names and chain CommandError.

SourceImageCollection.objects.get(name=...) can raise MultipleObjectsReturned when names aren’t unique; also, raising CommandError without chaining triggers B904. This was already flagged before.

✅ Proposed fix
-        except SourceImageCollection.DoesNotExist:
-            raise CommandError(f"SourceImageCollection '{collection_name}' not found")
+        except SourceImageCollection.DoesNotExist:
+            raise CommandError(f"SourceImageCollection '{collection_name}' not found") from None
+        except SourceImageCollection.MultipleObjectsReturned:
+            raise CommandError(
+                f"Multiple SourceImageCollections found with name '{collection_name}'. Use a unique name."
+            ) from None
#!/bin/bash
# Inspect SourceImageCollection.name uniqueness to confirm duplicate risk
rg -n "class SourceImageCollection" -C4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 32 - 37, The
current lookup using SourceImageCollection.objects.get(name=collection_name) can
raise MultipleObjectsReturned and the CommandError is raised without exception
chaining; update the try/except to catch both SourceImageCollection.DoesNotExist
and SourceImageCollection.MultipleObjectsReturned (or Exception as e) and when
raising CommandError use "raise CommandError(... ) from e" so the original
exception is chained; reference the lookup call
(SourceImageCollection.objects.get) and the raised CommandError to locate and
modify the handling.

40-44: ⚠️ Potential issue | 🟡 Minor

Chain CommandError exceptions to avoid B904 violations.

Lines 37 and 44 should use raise ... from None to explicitly handle exception chaining per Ruff B904. When intentionally replacing a caught exception with a more informative one, explicit chaining syntax distinguishes intentional replacement from accidental context loss.

Affected locations

Line 37 (SourceImageCollection):

         except SourceImageCollection.DoesNotExist:
-            raise CommandError(f"SourceImageCollection '{collection_name}' not found")
+            raise CommandError(f"SourceImageCollection '{collection_name}' not found") from None

Line 44 (Pipeline):

         except Pipeline.DoesNotExist:
-            raise CommandError(f"Pipeline '{pipeline_slug}' not found")
+            raise CommandError(f"Pipeline '{pipeline_slug}' not found") from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 40 - 44, The
except handlers that replace underlying exceptions with CommandError
(specifically the block calling Pipeline.objects.get(slug=pipeline_slug) and the
similar SourceImageCollection lookup) must explicitly suppress implicit
exception chaining; change the raise statements to use "raise CommandError(...)
from None" so the new CommandError replaces the original exception intentionally
(use the exact symbols Pipeline.objects.get and SourceImageCollection
lookup/exception handlers to find the two places to update).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/jobs/management/commands/test_ml_job_e2e.py`:
- Around line 78-81: The loop termination misses the UNKNOWN state because
JobState.final_states() doesn't include it; update the check inside the
monitoring loop to treat UNKNOWN as terminal by changing the condition to check
job.status against both JobState.final_states() and JobState.failed_states()
(e.g., if job.status in JobState.final_states() or job.status in
JobState.failed_states(): call self._show_final_results(job, start_time) and
break) so the loop cannot spin forever when status becomes UNKNOWN.
- Line 26: The handle method currently defines an unused variadic positional
parameter as *args which violates Ruff ARG002; rename it to *_args in the
function signature for handle to mark it intentionally unused and satisfy the
linter (ensure any internal references to the old name are updated or removed).
Keep the rest of the signature and behavior unchanged.

---

Duplicate comments:
In `@ami/jobs/management/commands/test_ml_job_e2e.py`:
- Around line 32-37: The current lookup using
SourceImageCollection.objects.get(name=collection_name) can raise
MultipleObjectsReturned and the CommandError is raised without exception
chaining; update the try/except to catch both SourceImageCollection.DoesNotExist
and SourceImageCollection.MultipleObjectsReturned (or Exception as e) and when
raising CommandError use "raise CommandError(... ) from e" so the original
exception is chained; reference the lookup call
(SourceImageCollection.objects.get) and the raised CommandError to locate and
modify the handling.
- Around line 40-44: The except handlers that replace underlying exceptions with
CommandError (specifically the block calling
Pipeline.objects.get(slug=pipeline_slug) and the similar SourceImageCollection
lookup) must explicitly suppress implicit exception chaining; change the raise
statements to use "raise CommandError(...) from None" so the new CommandError
replaces the original exception intentionally (use the exact symbols
Pipeline.objects.get and SourceImageCollection lookup/exception handlers to find
the two places to update).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11f9fd7 and f72a552.

📒 Files selected for processing (1)
  • ami/jobs/management/commands/test_ml_job_e2e.py

help="Job dispatch mode",
)

def handle(self, *args, **options):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the file exists and view content around line 26
if [ -f "ami/jobs/management/commands/test_ml_job_e2e.py" ]; then
  echo "=== File exists, showing lines 20-35 ==="
  sed -n '20,35p' ami/jobs/management/commands/test_ml_job_e2e.py | cat -n
else
  echo "File not found"
fi

Repository: RolnickLab/antenna

Length of output: 849


🏁 Script executed:

#!/bin/bash
# Find Ruff configuration files
echo "=== Looking for Ruff configuration files ==="
fd -t f '(pyproject\.toml|ruff\.toml|\.ruff\.toml|setup\.cfg|tox\.ini)' -x ls -lh {}

echo -e "\n=== Searching for Ruff/ARG configuration ==="
fd -t f '(pyproject\.toml|ruff\.toml|\.ruff\.toml)' -x rg -A 20 "^\[tool\.ruff\]|extend-ignore|ignore.*ARG|ARG002"

Repository: RolnickLab/antenna

Length of output: 330


🏁 Script executed:

#!/bin/bash
echo "=== Content of ruff.toml ==="
cat ruff.toml

echo -e "\n=== Content of pyproject.toml ==="
cat pyproject.toml

echo -e "\n=== Content of setup.cfg ==="
cat setup.cfg

Repository: RolnickLab/antenna

Length of output: 3366


Rename unused *args to *_args to satisfy Ruff ARG002 enforcement.

Line 26 contains an unused *args parameter that violates the ARG002 rule, which is actively enforced in this repository's Ruff configuration. Rename to *_args to indicate the parameter is intentionally unused.

Suggested fix
-    def handle(self, *args, **options):
+    def handle(self, *_args, **options):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def handle(self, *args, **options):
def handle(self, *_args, **options):
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 26-26: Unused method argument: args

(ARG002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` at line 26, The handle
method currently defines an unused variadic positional parameter as *args which
violates Ruff ARG002; rename it to *_args in the function signature for handle
to mark it intentionally unused and satisfy the linter (ensure any internal
references to the old name are updated or removed). Keep the rest of the
signature and behavior unchanged.

Comment on lines +78 to +81
# Check if finished
if job.status in JobState.final_states():
self._show_final_results(job, start_time)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Terminate monitoring on UNKNOWN to avoid infinite loop.

JobState.final_states() excludes UNKNOWN, yet JobState.failed_states() includes it. If the job ever becomes UNKNOWN, this loop never ends. Treat UNKNOWN as terminal (or use failed_states() in addition to final_states()).

🔧 Suggested fix
-            if job.status in JobState.final_states():
+            if job.status in JobState.final_states() or job.status == JobState.UNKNOWN:
                 self._show_final_results(job, start_time)
                 break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/jobs/management/commands/test_ml_job_e2e.py` around lines 78 - 81, The
loop termination misses the UNKNOWN state because JobState.final_states()
doesn't include it; update the check inside the monitoring loop to treat UNKNOWN
as terminal by changing the condition to check job.status against both
JobState.final_states() and JobState.failed_states() (e.g., if job.status in
JobState.final_states() or job.status in JobState.failed_states(): call
self._show_final_results(job, start_time) and break) so the loop cannot spin
forever when status becomes UNKNOWN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants