don't send batch again in multiple runs

semio · semio · commit e6fd1dc5bf0d · 2025-06-07T11:28:13.000+08:00
diff --git a/automation-api/.rules b/automation-api/.rules
@@ -10,6 +10,7 @@
 # How to test the gm-eval command
 - create a temp folder in project root
 - in the temp folder, run the gm-eval commands as you need
+    - don't create files your self, always use gm-eval download to get configurations.
 - double check the downloaded ai_eval_sheets. If they contains many data (more than 5 questions and 5 prompts), stop and confirm if you can continue first.
 
 # Environment
diff --git a/automation-api/backlogs/gm-eval-skip-existing-responses.md b/automation-api/backlogs/gm-eval-skip-existing-responses.md
@@ -0,0 +1,130 @@
+# GM-Eval Skip Existing Response Files - ✅ COMPLETED
+
+## Overview
+The gm-eval send, send-file, and evaluate commands are not properly checking if response files already exist before sending batches to LLM providers. This leads to unnecessary API calls, wasted resources, and potential duplicate processing.
+
+## Current Problems
+1. **send command**: Always sends batches even if `*-response.jsonl` files exist
+2. **send-file command**: Always processes files without checking for existing responses  
+3. **evaluate command**: When using `--send`, may send evaluation batches without checking if evaluation response files exist
+4. **Inconsistent behavior**: Only LiteLLM batch job checks for existing files, other providers don't
+
+## Plan
+
+### 1. Update Batch Job Base Class
+- Modify `BaseBatchJob.__init__()` to properly set `_is_completed` flag when response file exists
+- Update `send()` method to check completion status before processing
+- Ensure consistent behavior across all provider implementations
+
+### 2. Update Individual Batch Job Implementations
+- **OpenAI**: Add response file check in `send()` method before creating batch
+- **Anthropic**: Add response file check in `send()` method  
+- **Vertex**: Add response file check in `send()` method
+- **Mistral**: Add response file check in `send()` method
+- **LiteLLM**: Already implemented correctly, verify behavior
+
+### 3. Add Skip Logic Messages
+- Log clear messages when skipping due to existing response files
+- Include file paths in skip messages for clarity
+- Differentiate between "already processing" and "already completed" states
+
+### 4. Testing Strategy
+- Test each provider with existing response files
+- Verify that `--wait` flag works correctly when files already exist
+- Test force re-processing options if needed
+- Verify evaluation command behavior with `--send` flag
+
+## Implementation Details
+
+### Expected Behavior
+When a command is run and the response file already exists:
+1. Log: "Response file already exists: {path}"  
+2. Log: "Skipping batch processing for {model_config_id}"
+3. Return success without making API calls
+4. If `--wait` is specified, should still work (return existing file path)
+
+### Files to Modify
+- `automation-api/lib/pilot/batchjob/base.py`
+- `automation-api/lib/pilot/batchjob/openai.py` 
+- `automation-api/lib/pilot/batchjob/anthropic.py`
+- `automation-api/lib/pilot/batchjob/vertex.py`
+- `automation-api/lib/pilot/batchjob/mistral.py`
+- Potentially `automation-api/lib/pilot/generate_eval_prompts.py` for evaluate command
+
+## Success Criteria
+- All gm-eval commands skip processing when response files exist
+- Clear logging messages indicate when and why processing is skipped
+- No breaking changes to existing functionality
+- Consistent behavior across all LLM providers
+- Test coverage for skip scenarios
+
+## Future Considerations
+- Add `--force` flag to override skip behavior when needed
+- Consider checksums to detect if input files changed since response generation
+- Add validation to ensure response files are complete/valid before skipping
+
+## Summarization of What Has Been Done
+
+### December 7, 2025: Complete Implementation - FINISHED
+**Successfully implemented skip logic for all gm-eval commands**
+
+#### Core Implementation Achievements:
+- ✅ **Updated BaseBatchJob class** with common skip logic:
+  - Added `is_completed` property to check completion status
+  - Added `should_skip_processing()` method with clear logging
+  - Ensured consistent behavior across all provider implementations
+
+- ✅ **Updated all batch job implementations**:
+  - **OpenAI**: Fixed to use base class initialization and added skip logic
+  - **Anthropic**: Updated to inherit properly and use skip logic
+  - **Vertex**: Modified to use base class with custom output path handling
+  - **Mistral**: Updated to inherit base class and added skip logic
+  - **LiteLLM**: Refactored to use common skip logic method
+
+- ✅ **Fixed batch processing with wait flag**:
+  - Updated `process_batch()` to detect when batches are skipped
+  - Properly handle `--wait` flag when response files already exist
+  - Avoid attempting to wait for non-existent batch jobs
+
+#### Testing Results:
+- ✅ **Unit tests**: All batch job classes correctly identify existing response files
+- ✅ **send command**: Skips processing when response files exist
+- ✅ **send-file command**: Skips processing when response files exist  
+- ✅ **run command**: End-to-end test successful with skip logic
+- ✅ **Wait functionality**: Correctly handles existing files without errors
+
+#### Key Features Implemented:
+1. **Consistent Skip Logic**: All providers now check for existing response files before processing
+2. **Clear Logging**: Users see informative messages when processing is skipped
+3. **Wait Flag Compatibility**: `--wait` works correctly whether batch is new or skipped
+4. **No Breaking Changes**: Existing functionality preserved while adding skip capability
+5. **Provider Agnostic**: Same behavior across OpenAI, Anthropic, Vertex, Mistral, and LiteLLM
+
+#### Critical Bug Fixes:
+- ✅ **Fixed "batch job not started" error**: When skipping due to existing files, wait logic now handles correctly
+- ✅ **Proper inheritance**: All batch job classes now properly inherit from BaseBatchJob
+- ✅ **Consistent output paths**: All providers use same output path calculation logic
+
+### Final Implementation Summary
+
+The gm-eval commands now properly skip sending batches when response files already exist:
+
+#### New Behavior:
+- **send command**: Checks for `*-response.jsonl` files and skips batch creation if they exist
+- **send-file command**: Checks for response files and skips processing if they exist
+- **evaluate command**: Will skip sending evaluation batches if evaluation response files exist
+- **run command**: Handles skip logic throughout the entire pipeline
+
+#### User Experience Improvements:
+- Clear log messages: "Response file already exists: {path}"
+- Skip notification: "Skipping batch processing - job already completed"
+- Wait flag support: "Response file already exists - no need to wait"
+- Results indication: "Results already available at: {path}"
+
+#### Technical Implementation:
+- All batch job classes inherit consistent skip logic from BaseBatchJob
+- Skip detection happens before any API calls are made
+- Existing functionality completely preserved
+- No configuration changes required
+
+The implementation successfully prevents unnecessary API calls, reduces costs, and improves user experience while maintaining full backward compatibility.
diff --git a/automation-api/lib/pilot/batchjob/anthropic.py b/automation-api/lib/pilot/batchjob/anthropic.py
@@ -12,7 +12,6 @@
 from lib.app_singleton import AppSingleton
 from lib.config import read_config
 
-from ..utils import get_output_path
 from .base import BaseBatchJob
 
 logger = AppSingleton().get_logger()
@@ -37,17 +36,9 @@ def __init__(self, jsonl_path: str):
         Args:
             jsonl_path: Path to JSONL file containing prompts
         """
-        self.jsonl_path = jsonl_path
-        self._batch_id = None
-        self._output_path = get_output_path(jsonl_path)
-        self._processing_file = f"{self._output_path}.processing"
+        super().__init__(jsonl_path)
         self._client = _get_client()
 
-        # Check if job is already being processed
-        if os.path.exists(self._processing_file):
-            with open(self._processing_file, "r") as f:
-                self._batch_id = f.read().strip()
-
     def send(self) -> str:
         """
         Submit batch job to Anthropic.
@@ -56,6 +47,10 @@ def send(self) -> str:
             batch_id: Unique identifier for the batch job
         """
         try:
+            # Check if response file already exists
+            if self.should_skip_processing():
+                return self._output_path
+
             # Check for existing processing file
             if os.path.exists(self._processing_file):
                 logger.info("Batch already being processed.")
diff --git a/automation-api/lib/pilot/batchjob/base.py b/automation-api/lib/pilot/batchjob/base.py
@@ -115,6 +115,24 @@ def output_path(self) -> str:
         """Get the output file path."""
         return self._output_path
 
+    @property
+    def is_completed(self) -> bool:
+        """Check if the batch job is already completed."""
+        return self._is_completed
+
+    def should_skip_processing(self) -> bool:
+        """
+        Check if batch processing should be skipped.
+
+        Returns:
+            True if response file already exists and processing should be skipped
+        """
+        if self._is_completed:
+            logger.info(f"Response file already exists: {self._output_path}")
+            logger.info("Skipping batch processing - job already completed")
+            return True
+        return False
+
     def _get_output_path(self) -> str:
         """Calculate output path from input path."""
         base_name = os.path.splitext(os.path.basename(self.jsonl_path))[0]
diff --git a/automation-api/lib/pilot/batchjob/litellm.py b/automation-api/lib/pilot/batchjob/litellm.py
@@ -11,7 +11,6 @@
 from lib.app_singleton import AppSingleton
 from lib.config import read_config
 
-from ..utils import get_output_path
 from .base import BaseBatchJob
 
 logger = AppSingleton().get_logger()
@@ -38,18 +37,11 @@ def __init__(self, jsonl_path: str, provider: Optional[str] = None, num_processe
             provider: API provider (e.g., "alibaba")
             num_processes: Number of processes to use for parallel processing
         """
-        self.jsonl_path = jsonl_path
+        super().__init__(jsonl_path)
         self._provider = provider
         self._num_processes = num_processes
-        self._output_path = get_output_path(jsonl_path)
         self._batch_id = jsonl_path
 
-        # Check if job is already completed
-        if os.path.exists(self._output_path):
-            self._is_completed = True
-        else:
-            self._is_completed = False
-
     def send(self) -> str:
         """
         Process all prompts in the JSONL file.
@@ -60,9 +52,8 @@ def send(self) -> str:
             result: the output path, or empty string if failed to send prompts
         """
         try:
-            # Check if already completed
-            if self._is_completed and os.path.exists(self._output_path):
-                logger.info(f"Batch {self._batch_id} already completed.")
+            # Check if response file already exists
+            if self.should_skip_processing():
                 return self._output_path
 
             # Process all prompts
@@ -91,7 +82,7 @@ def check_status(self) -> str:
         Returns:
             status: Job status string ("completed" or "n/a")
         """
-        if self._is_completed or os.path.exists(self._output_path):
+        if self.is_completed or os.path.exists(self._output_path):
             return "completed"
         else:
             return "n/a"
diff --git a/automation-api/lib/pilot/batchjob/mistral.py b/automation-api/lib/pilot/batchjob/mistral.py
@@ -10,7 +10,7 @@
 from lib.app_singleton import AppSingleton
 from lib.config import read_config
 
-from ..utils import generate_batch_id, get_output_path
+from ..utils import generate_batch_id
 from .base import BaseBatchJob
 
 logger = AppSingleton().get_logger()
@@ -84,19 +84,9 @@ def __init__(
             model_id: Mistral model ID to use (e.g., mistral-small-latest, codestral-latest)
             timeout_hours: Number of hours after which the job should expire (default: 24, max: 168)
         """
-        self.jsonl_path = jsonl_path
+        super().__init__(jsonl_path)
         self.model_id = model_id or "mistral-small-latest"
         self.timeout_hours = timeout_hours
-        self._batch_id = None
-        self._output_path = get_output_path(jsonl_path)
-        self._processing_file = f"{self._output_path}.processing"
-
-        # Check if job is already being processed
-        if os.path.exists(self._processing_file):
-            with open(self._processing_file, "r") as f:
-                self._batch_id = f.read().strip()
-
-        # initialize client
         self._client = _get_client()
 
     def send(self) -> str:
@@ -107,6 +97,10 @@ def send(self) -> str:
             batch_id: Unique identifier for the batch job
         """
         try:
+            # Check if response file already exists
+            if self.should_skip_processing():
+                return self._output_path
+
             # Check for existing processing file
             if os.path.exists(self._processing_file):
                 logger.info("Batch already being processed.")
diff --git a/automation-api/lib/pilot/batchjob/openai.py b/automation-api/lib/pilot/batchjob/openai.py
@@ -10,7 +10,7 @@
 from lib.app_singleton import AppSingleton
 from lib.config import read_config
 
-from ..utils import generate_batch_id, get_output_path
+from ..utils import generate_batch_id
 from .base import BaseBatchJob
 
 logger = AppSingleton().get_logger()
@@ -47,18 +47,8 @@ def __init__(self, jsonl_path: str, provider: str = "openai"):
             jsonl_path: Path to JSONL file containing prompts
             provider: API provider ("openai" or "alibaba")
         """
-        self.jsonl_path = jsonl_path
-        self._batch_id = None
+        super().__init__(jsonl_path)
         self._provider = provider
-        self._output_path = get_output_path(jsonl_path)
-        self._processing_file = f"{self._output_path}.processing"
-
-        # Check if job is already being processed
-        if os.path.exists(self._processing_file):
-            with open(self._processing_file, "r") as f:
-                self._batch_id = f.read().strip()
-
-        # initialize client
         self._client = _get_client(provider)
 
     def send(self) -> str:
@@ -69,6 +59,10 @@ def send(self) -> str:
             batch_id: Unique identifier for the batch job
         """
         try:
+            # Check if response file already exists
+            if self.should_skip_processing():
+                return self._output_path
+
             # Check for existing processing file
             if os.path.exists(self._processing_file):
                 logger.info("Batch already being processed.")
diff --git a/automation-api/lib/pilot/batchjob/vertex.py b/automation-api/lib/pilot/batchjob/vertex.py
@@ -40,11 +40,18 @@ def __init__(self, jsonl_path: str, model_id: str):
             jsonl_path: Path to JSONL file containing prompts
             model_id: The model to send to
         """
-        self.jsonl_path = jsonl_path
+        # Initialize base class with custom output path for Vertex AI
+        super().__init__(jsonl_path)
+
+        # Override the output path for Vertex AI specific naming
         _, output_path = get_batch_id_and_output_path(jsonl_path)
-        self._batch_id = None
         self._output_path = output_path
         self._processing_file = f"{self._output_path}.processing"
+
+        # Check if response file exists and update completion status
+        if os.path.exists(self._output_path):
+            self._is_completed = True
+
         self._model_id = model_id
 
         # find custom id mapping file
@@ -63,10 +70,10 @@ def __init__(self, jsonl_path: str, model_id: str):
             custom_id = row["prompt_id"]
             self._custom_id_mapping[prompt_text] = custom_id
 
-            # Check if job is already being processed
-            if os.path.exists(self._processing_file):
-                with open(self._processing_file, "r") as f:
-                    self._batch_id = f.read().strip()
+        # Check if job is already being processed
+        if os.path.exists(self._processing_file):
+            with open(self._processing_file, "r") as f:
+                self._batch_id = f.read().strip()
 
         # initial vertexai
         config = read_config()
@@ -91,6 +98,10 @@ def send(self) -> str:
             batch_id: Unique identifier for the batch job
         """
         try:
+            # Check if response file already exists
+            if self.should_skip_processing():
+                return self._output_path
+
             # Check for existing processing file
             if os.path.exists(self._processing_file):
                 logger.info("Batch already being processed.")
diff --git a/automation-api/lib/pilot/gm_eval/commands/run.py b/automation-api/lib/pilot/gm_eval/commands/run.py
@@ -167,6 +167,7 @@ def handle(args: argparse.Namespace) -> int:
                 wait=True,  # Always wait for send step
                 processes=args.processes,
                 timeout_hours=args.timeout_hours,
+                force_regenerate=False,  # Default to not force regenerate
             )
             result = send.handle(send_args)
             if result != 0:
diff --git a/automation-api/lib/pilot/send_batch_prompt.py b/automation-api/lib/pilot/send_batch_prompt.py

Original file line number	Diff line number	Diff line change
`@@ -167,6 +167,7 @@ def handle(args: argparse.Namespace) -> int:`
`167`	`167`	`wait=True, # Always wait for send step`
`168`	`168`	`processes=args.processes,`
`169`	`169`	`timeout_hours=args.timeout_hours,`
	`170`	`+ force_regenerate=False, # Default to not force regenerate`
`170`	`171`	`)`
`171`	`172`	`result = send.handle(send_args)`
`172`	`173`	`if result != 0:`