-
Notifications
You must be signed in to change notification settings - Fork 152
CaloCDB: Refactor runProd.py #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Modernize path handling, datetime logic, and job tracking - Migrated to 'from datetime import datetime' to fix AttributeError and standardize timestamp formatting across the script. - Replaced unsafe Path concatenation with f-string interpolation and explicit Path casting to resolve TypeError during directory naming. - Added 'create_symlink' utility to replicate 'ln -sfn' behavior, ensuring calibration tags safely overwrite existing links. - Introduced a blocking job-progress monitor in 'generate_condor' that polls the output directory until all expected jobs are finished. - Enhanced output directory structure by adding an optional calibration tags directory and automated daily symlinking. - Improved 'generate_condor' to include job counting and cleaner log directory cleanup using 'shutil.rmtree' with 'ignore_errors'.
For repository maintainers, please start the CI check manually (feedback)This is an automatic message to assist manually starting CI check for this pull request, commit 039d87830e2488ae9e67fc5d7fecb765508ab7ff. sPHENIX software maintainers: please make your input here and start the Build: Note:
Automatically generated by sPHENIX Jenkins continuous integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the runProd.py script to modernize path handling, datetime operations, and job monitoring. The changes aim to fix type errors in path concatenation, standardize datetime usage, add automated symlink management for calibration tags, and introduce blocking job progress monitoring for Condor submissions.
Key Changes:
- Migrated from
import datetimetofrom datetime import datetimeto fix AttributeError issues - Replaced unsafe Path concatenation with f-string interpolation for timestamped directory naming
- Added
create_symlink()utility function and automated daily symlinking to calibration tags directory - Introduced blocking job-progress monitor that polls until all Condor jobs complete
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while True: | ||
| finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir()) | ||
|
|
||
| if finished_jobs >= jobs: | ||
| logger.info(f"All Jobs Complete. {finished_jobs}/{jobs} Jobs.") | ||
| break | ||
|
|
||
| logger.info(f"Waiting for Jobs... {finished_jobs}/{jobs} done.") | ||
| time.sleep(15) # Check every 15 seconds |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job monitoring loop assumes that each completed job creates a directory in 'output/', but this assumption is not validated. If jobs fail silently or create files instead of directories, the counter will be inaccurate and the loop may wait indefinitely. Consider adding error handling to check for job failures (e.g., by checking stderr files in the 'error/' directory) or adding a timeout mechanism to prevent infinite waiting.
| while True: | |
| finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir()) | |
| if finished_jobs >= jobs: | |
| logger.info(f"All Jobs Complete. {finished_jobs}/{jobs} Jobs.") | |
| break | |
| logger.info(f"Waiting for Jobs... {finished_jobs}/{jobs} done.") | |
| time.sleep(15) # Check every 15 seconds | |
| start_time = time.time() | |
| # Maximum time to wait for all jobs to finish (in seconds) | |
| max_wait_seconds = 6 * 60 * 60 # 6 hours | |
| while True: | |
| # Count both files and directories as completed job artifacts | |
| finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir() or x.is_file()) | |
| if finished_jobs >= jobs: | |
| logger.info(f"All Jobs Complete. {finished_jobs}/{jobs} Jobs.") | |
| break | |
| # Timeout to avoid waiting indefinitely if jobs fail silently | |
| if time.time() - start_time > max_wait_seconds: | |
| logger.error( | |
| "Timeout reached while waiting for Condor jobs: " | |
| f"{finished_jobs}/{jobs} jobs appear to have produced output " | |
| f"after waiting {max_wait_seconds} seconds." | |
| ) | |
| break | |
| logger.info(f"Waiting for Jobs... {finished_jobs}/{jobs} done.") | |
| time.sleep(15) # Check every 15 seconds |
| if os.path.exists(condor_log_dir): | ||
| shutil.rmtree(condor_log_dir) | ||
| logger.info(f"Directory '{condor_log_dir}' and its contents removed.") | ||
| condor_log_dir.mkdir(parents=True, exist_ok=True) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the directory only to immediately delete it in the next line is redundant and inefficient. The first 'mkdir' call serves no purpose since 'shutil.rmtree' with 'ignore_errors=True' will silently succeed whether the directory exists or not. Remove line 337 to eliminate this unnecessary operation.
| condor_log_dir.mkdir(parents=True, exist_ok=True) |
| jobs = 0 | ||
| with open(output / 'jobs.list', 'w', encoding="utf-8") as f: | ||
| for file_path in list_files: | ||
| f.write(str(file_path.resolve()) + '\n') | ||
| jobs += 1 |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job counting can be simplified by using 'len(list_files)' instead of maintaining a separate counter. This eliminates the need for the 'jobs' variable and the increment operation, making the code more concise and less error-prone.
| jobs = 0 | |
| with open(output / 'jobs.list', 'w', encoding="utf-8") as f: | |
| for file_path in list_files: | |
| f.write(str(file_path.resolve()) + '\n') | |
| jobs += 1 | |
| jobs = len(list_files) | |
| with open(output / 'jobs.list', 'w', encoding="utf-8") as f: | |
| for file_path in list_files: | |
| f.write(str(file_path.resolve()) + '\n') |
| print(f"Removed existing link/file at {link}") | ||
|
|
||
| # Create the new symlink | ||
| link.symlink_to(target) | ||
| print(f"Created symlink: {link} -> {target}") |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'create_symlink' function uses print statements instead of the logger that's used throughout the rest of the script. For consistency and proper log file integration, replace 'print()' calls with 'logger.info()' or 'logger.debug()' to ensure these messages are captured in the log file configured at line 437.
| print(f"Removed existing link/file at {link}") | |
| # Create the new symlink | |
| link.symlink_to(target) | |
| print(f"Created symlink: {link} -> {target}") | |
| logger.info(f"Removed existing link/file at {link}") | |
| # Create the new symlink | |
| link.symlink_to(target) | |
| logger.info(f"Created symlink: {link} -> {target}") |
| job_dir = output / 'output' | ||
|
|
||
| # Check Job Progress | ||
| while True: | ||
| finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir()) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job monitoring starts immediately after submitting jobs without any initial delay, which could cause a race condition. If 'condor_submit' takes time to create the initial job directories, the first check may incorrectly show 0/N jobs completed. Consider adding a small initial delay (e.g., a few seconds) before starting the monitoring loop, or check that the job submission has completed successfully before beginning to poll.
| tag = output_calib_tags / CURRENT_DATE | ||
| create_symlink(output / 'output', tag) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symlink is created before the 'output/output' directory exists. The target directory 'output / 'output'' is only created later by the condor jobs in the 'generate_condor' function (line 360). This will cause the symlink to point to a non-existent directory at creation time, potentially resulting in a broken symlink. Consider moving this symlink creation to after the directory structure is set up in 'generate_condor', or create the output directory explicitly before creating the symlink.
| tag = output_calib_tags / CURRENT_DATE | |
| create_symlink(output / 'output', tag) | |
| # Ensure the output subdirectory exists before creating the symlink | |
| output_subdir = output / 'output' | |
| output_subdir.mkdir(parents=True, exist_ok=True) | |
| tag = output_calib_tags / CURRENT_DATE | |
| create_symlink(output_subdir, tag) |
| jobs += 1 | ||
|
|
||
| # list of subdirectories to create | ||
| subdirectories = ['stdout', 'error', 'output'] |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subdirectory name 'output' in the subdirectories list creates a path collision with the function parameter 'output'. This makes the code confusing because 'output / 'output'' refers to a subdirectory inside the output directory. Consider renaming the subdirectory to something more descriptive like 'results', 'job_output', or 'completed' to improve code clarity and avoid confusion.
| args = parser.parse_args() | ||
| run_type = args.run_type | ||
| CURRENT_DATE = str(datetime.date.today()) | ||
| CURRENT_DATE = datetime.now().strftime("%m-%d-%y") |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date format was changed from YYYY-MM-DD to MM-DD-YY, which breaks chronological sorting. Directories and files named with this format will not sort correctly by name (e.g., '12-31-24' comes before '01-01-25' alphabetically but represents a later date). Revert to the ISO 8601 format '%Y-%m-%d' for proper chronological ordering, or use '%Y-%m-%d' if you need the full year.
| CURRENT_DATE = datetime.now().strftime("%m-%d-%y") | |
| CURRENT_DATE = datetime.now().strftime("%Y-%m-%d") |
| link = Path(link_path) | ||
|
|
||
| # Check if the link destination already exists | ||
| if link.exists() or link.is_symlink(): |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function attempts to unlink a path that could be a non-empty directory. While the docstring mentions that 'link.unlink()' won't remove target directories, it will fail with 'IsADirectoryError' if 'link_path' is an actual directory (not a symlink). The current code doesn't handle this case. Consider using 'shutil.rmtree()' if the path is a directory, or explicitly check the path type before calling 'unlink()'.
| if link.exists() or link.is_symlink(): | |
| if link.exists() or link.is_symlink(): | |
| # If link_path is a real directory (not a symlink), do not attempt to unlink it. | |
| # This matches the documented behavior and avoids IsADirectoryError. | |
| if link.is_dir() and not link.is_symlink(): | |
| raise OSError(f"Cannot overwrite existing directory at {link}") |
| Replicates the behavior of 'ln -sfn' by removing the destination | ||
| if it already exists (including broken symlinks) before creating | ||
| the new link. | ||
|
|
||
| Args: | ||
| target_path (str or Path): The existing file or directory to point to. | ||
| link_path (str or Path): The path where the symlink should be created. | ||
|
|
||
| Raises: | ||
| OSError: If the link_path exists as a real directory and cannot be unlinked, or if permission is denied. |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring states that the function replicates 'ln -sfn' behavior, but the '-n' flag in 'ln -sfn' treats the destination as a normal file if it's a symlink to a directory. The current implementation doesn't replicate this behavior - it will follow symlinks when checking 'link.exists()'. For true 'ln -sfn' equivalence, use 'link.is_symlink()' check only, or document the deviation from '-n' behavior.
| Replicates the behavior of 'ln -sfn' by removing the destination | |
| if it already exists (including broken symlinks) before creating | |
| the new link. | |
| Args: | |
| target_path (str or Path): The existing file or directory to point to. | |
| link_path (str or Path): The path where the symlink should be created. | |
| Raises: | |
| OSError: If the link_path exists as a real directory and cannot be unlinked, or if permission is denied. | |
| Approximates the behavior of ``ln -sfn`` by removing the destination | |
| at ``link_path`` if it already exists (including broken symlinks) | |
| before creating the new link. | |
| Note: | |
| This function uses :meth:`pathlib.Path.exists`, which follows | |
| symlinks to directories, and unlinks any existing path at | |
| ``link_path``. As a result, its behavior may differ from the | |
| exact semantics of ``ln -sfn`` (in particular the ``-n`` flag) | |
| in some edge cases. | |
| Args: | |
| target_path (str or Path): The existing file or directory to point to. | |
| link_path (str or Path): The path where the symlink should be created. | |
| Raises: | |
| OSError: If the link_path exists and cannot be unlinked, or if | |
| permission is denied. |
- Add new run type: run3oo (set as default)
📝 WalkthroughWalkthroughModified production script for calorimeter calibration database to add run3oo run type support, new calibration tag output option, symlink creation utility, improved job submission with progress polling, and enhanced timestamp handling in output paths. Changes
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
For repository maintainers, please start the CI check manually (feedback)This is an automatic message to assist manually starting CI check for this pull request, commit db0870a3a218cff213368a754d9c9faecc4dd469. sPHENIX software maintainers: please make your input here and start the Build: Note:
Automatically generated by sPHENIX Jenkins continuous integration |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
calibrations/calo/calo_cdb/scripts/runProd.py (3)
337-341: Redundant directory creation pattern.The current logic creates the directory, immediately deletes it, then creates it again. This is unnecessarily complex.
♻️ Simplified cleanup
- condor_log_dir.mkdir(parents=True, exist_ok=True) - shutil.rmtree(condor_log_dir, ignore_errors=True) - - # Setup Condor Log Dir - condor_log_dir.mkdir(parents=True, exist_ok=True) + # Clean and recreate Condor Log Dir + shutil.rmtree(condor_log_dir, ignore_errors=True) + condor_log_dir.mkdir(parents=True, exist_ok=True)
374-374: Remove extraneous f-prefix.This f-string has no placeholders. The static analysis tool flagged this correctly.
🔧 Quick fix
- command = f'condor_submit genStatus.sub -queue "input_run from jobs.list"' + command = 'condor_submit genStatus.sub -queue "input_run from jobs.list"'
445-445: Static analysis: use conversion flag.The static analyzer suggests using an f-string conversion flag instead of explicit
str()call. This is a minor style preference.🔧 Quick fix
- logger.info(f'LOGGING: {str(datetime.now())}') + logger.info(f'LOGGING: {datetime.now()!s}')Alternatively, since
datetime.now()already has a sensible__str__, you can simply use:logger.info(f'LOGGING: {datetime.now()}')
| def create_symlink(target_path, link_path): | ||
| """ | ||
| Creates a symbolic link pointing to target_path named link_path. | ||
|
|
||
| Replicates the behavior of 'ln -sfn' by removing the destination | ||
| if it already exists (including broken symlinks) before creating | ||
| the new link. | ||
|
|
||
| Args: | ||
| target_path (str or Path): The existing file or directory to point to. | ||
| link_path (str or Path): The path where the symlink should be created. | ||
|
|
||
| Raises: | ||
| OSError: If the link_path exists as a real directory and cannot be unlinked, or if permission is denied. | ||
| """ | ||
| target = Path(target_path) | ||
| link = Path(link_path) | ||
|
|
||
| # Check if the link destination already exists | ||
| if link.exists() or link.is_symlink(): | ||
| # link.unlink() removes the symlink or file, but not the target directory | ||
| link.unlink() | ||
| print(f"Removed existing link/file at {link}") | ||
|
|
||
| # Create the new symlink | ||
| link.symlink_to(target) | ||
| print(f"Created symlink: {link} -> {target}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issue: unlink() fails on real directories.
If link_path is an existing real directory (not a symlink), link.unlink() at line 326 will raise IsADirectoryError. The docstring acknowledges this but doesn't handle it. If this scenario is possible in your workflow, consider using shutil.rmtree() for directories.
Also, for consistency with the rest of the module, consider using logger.info() instead of print() for the status messages.
🛠️ Optional fix for directory handling and logging
def create_symlink(target_path, link_path):
"""
Creates a symbolic link pointing to target_path named link_path.
Replicates the behavior of 'ln -sfn' by removing the destination
if it already exists (including broken symlinks) before creating
the new link.
Args:
target_path (str or Path): The existing file or directory to point to.
link_path (str or Path): The path where the symlink should be created.
Raises:
OSError: If the link_path exists as a real directory and cannot be unlinked, or if permission is denied.
"""
target = Path(target_path)
link = Path(link_path)
# Check if the link destination already exists
if link.exists() or link.is_symlink():
- # link.unlink() removes the symlink or file, but not the target directory
- link.unlink()
- print(f"Removed existing link/file at {link}")
+ if link.is_symlink() or link.is_file():
+ link.unlink()
+ else:
+ shutil.rmtree(link)
+ logger.info(f"Removed existing link/file at {link}")
# Create the new symlink
link.symlink_to(target)
- print(f"Created symlink: {link} -> {target}")
+ logger.info(f"Created symlink: {link} -> {target}")| job_dir = output / 'output' | ||
|
|
||
| # Check Job Progress | ||
| while True: | ||
| finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir()) | ||
|
|
||
| if finished_jobs >= jobs: | ||
| logger.info(f"All Jobs Complete. {finished_jobs}/{jobs} Jobs.") | ||
| break | ||
|
|
||
| logger.info(f"Waiting for Jobs... {finished_jobs}/{jobs} done.") | ||
| time.sleep(15) # Check every 15 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential infinite loop if jobs fail.
This polling loop has no timeout mechanism. If condor jobs fail without creating output directories, the script will hang indefinitely. For a production cron job, this could block subsequent runs.
Consider adding:
- A maximum wait time / timeout
- Checking for job failures via condor_q or error files
- At minimum, logging a warning after extended waiting periods
🛡️ Suggested timeout mechanism
job_dir = output / 'output'
+ max_wait_seconds = 3600 * 6 # 6 hour timeout
+ elapsed = 0
# Check Job Progress
while True:
finished_jobs = sum(1 for x in job_dir.iterdir() if x.is_dir())
if finished_jobs >= jobs:
logger.info(f"All Jobs Complete. {finished_jobs}/{jobs} Jobs.")
break
+ if elapsed >= max_wait_seconds:
+ logger.error(f"Timeout reached. Only {finished_jobs}/{jobs} jobs completed.")
+ break
+
logger.info(f"Waiting for Jobs... {finished_jobs}/{jobs} done.")
time.sleep(15) # Check every 15 seconds
+ elapsed += 15| output_calib_tags.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| tag = output_calib_tags / CURRENT_DATE | ||
| create_symlink(output / 'output', tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlink created before target directory exists.
At line 435, create_symlink(output / 'output', tag) is called, but the output/output directory is only created later at line 360 in generate_condor(). This creates a symlink pointing to a non-existent target.
While symlinks to non-existent paths are technically valid (dangling symlinks), this may cause confusion or errors if something tries to follow the symlink before jobs complete.
Consider either:
- Moving the symlink creation to after
generate_condor()completes - Creating the
output/outputdirectory earlier inmain()


Modernize path handling, datetime logic, and job tracking
CaloCDB: Refactor runProd.py
Motivation & Context
This pull request refactors the calibration production script to fix critical bugs, standardize code patterns, and support new run types. The changes address a datetime-related AttributeError, unsafe file path operations, and add more robust job management capabilities for the production pipeline.
Key Changes
from datetime import datetime) and updated timestamp formatting to ISO format (yyyy-mm-dd-HH-MM-SS) with also updated log display format (mm-dd-yy)create_symlink(target_path, link_path)function that safely handles symbolic link creation/overwriting (replicatesln -sfnbehavior), enabling calibration tags to update existing linksgenerate_condorthat polls the output directory every 15 seconds until all expected jobs complete-o2/--output-calib-tagsCLI option) with automated daily symlink creationjobs.listshutil.rmtree(ignore_errors=True)for cleaner, more robust deletionPotential Risk Areas
Possible Future Improvements
Note: AI-generated summaries can contain inaccuracies. Please review the actual code changes and commit history to verify implementation details.