Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Apr 30, 2025

User description

Restart a failed --all optimization run from a checkpoint
The checkpoint is only valid for 7 days


PR Type

Enhancement


Description

  • Implement checkpoint file writing and reading

  • Resume incomplete --all optimization runs

  • Skip already-processed functions automatically

  • Remove checkpoints older than seven days


Changes walkthrough 📝

Relevant files
Enhancement
checkpoint.py
Implement run checkpoint functionality                                     

codeflash/code_utils/checkpoint.py

  • Introduce CodeflashRunCheckpoint class and methods
  • Initialize and update checkpoint JSONL metadata
  • Append processed functions with status entries
  • Provide get_all_historical_functions cleanup logic
  • Add CLI prompt in ask_should_use_checkpoint_get_functions
  • +128/-0 
    functions_to_optimize.py
    Skip previously checkpointed functions                                     

    codeflash/discovery/functions_to_optimize.py

  • Add previous_checkpoint_functions parameter to method signatures
  • Filter out functions already in checkpoint during filtering
  • Track count of skipped functions from checkpoint
  • Minor typing import adjustment for Any
  • +35/-15 
    optimizer.py
    Integrate checkpoint resume into optimizer run                     

    codeflash/optimization/optimizer.py

  • Import and integrate checkpoint utilities in run()
  • Prompt resume and pass previous_checkpoint_functions
  • Instantiate CodeflashRunCheckpoint for --all runs
  • Add each optimized function to checkpoint after success
  • Pass checkpoint data into discovery filtering
  • +44/-16 
    Formatting
    function_optimizer.py
    Refactor formatting and ratio precision                                   

    codeflash/optimization/function_optimizer.py

  • Reformat long loops and zip calls for readability
  • Refactor telemetry event calls with proper indentation
  • Adjust speedup ratio precision to three decimal places
  • Minor signature formatting for log_successful_optimization
  • +27/-8   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @misrasaurabh1 misrasaurabh1 requested review from KRRT7, aseembits93 and dasarchan and removed request for aseembits93 April 30, 2025 00:29
    @github-actions
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Filter Logic

    The blocklist filter condition in filter_functions appears inverted, potentially allowing blocked functions to slip through or excluding valid ones.

        if not (
            function.file_path.name in blocklist_funcs
            and function.qualified_name in blocklist_funcs[function.file_path.name]
        ):
            blocklist_funcs_removed_count += 1
            continue
        functions_tmp.append(function)
    _functions = functions_tmp
    Performance Concern

    _update_metadata_timestamp rewrites the entire checkpoint file on each addition, which could degrade performance for large checkpoint files.

    with self.checkpoint_path.open("w") as f:
        f.write(json.dumps(metadata) + "\n")
        f.write(rest_content)

    @github-actions
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Create checkpoint directory if missing

    Ensure the checkpoint directory exists before creating the file. This prevents
    errors if a custom directory does not exist.

    codeflash/code_utils/checkpoint.py [16-18]

     self.checkpoint_dir = Path(checkpoint_dir)
    +self.checkpoint_dir.mkdir(parents=True, exist_ok=True)
     # Create a unique checkpoint file name
     unique_id = str(uuid.uuid4())[:8]
    Suggestion importance[1-10]: 7

    __

    Why: Ensuring the checkpoint directory exists with mkdir prevents file creation errors when a custom checkpoint_dir is missing and improves robustness.

    Medium
    Count all ignored functions

    Increment the removal count by the number of functions filtered out rather than by
    one, for accurate logging.

    codeflash/discovery/functions_to_optimize.py [434-438]

     if file_path in ignore_paths or any(
         file_path.startswith(str(ignore_path) + os.sep) for ignore_path in ignore_paths
     ):
    -    ignore_paths_removed_count += 1
    +    ignore_paths_removed_count += len(_functions)
         continue
    Suggestion importance[1-10]: 6

    __

    Why: Changing ignore_paths_removed_count to increment by len(_functions) aligns its behavior with other removal counters and yields accurate logging of filtered functions.

    Low
    Fix blocklist logging label

    Correct the label for blocklisted functions to avoid confusing them with previously
    optimized ones.

    codeflash/discovery/functions_to_optimize.py [500-501]

     log_info = {
         f"{test_functions_removed_count} test function{'s' if test_functions_removed_count != 1 else ''}": test_functions_removed_count,
         f"{site_packages_removed_count} site-package function{'s' if site_packages_removed_count != 1 else ''}": site_packages_removed_count,
         ...
    -    f"{blocklist_funcs_removed_count} function{'s' if blocklist_funcs_removed_count != 1 else ''} as previously optimized": blocklist_funcs_removed_count,
    +    f"{blocklist_funcs_removed_count} blocklisted function{'s' if blocklist_funcs_removed_count != 1 else ''}": blocklist_funcs_removed_count,
         f"{previous_checkpoint_functions_removed_count} function{'s' if previous_checkpoint_functions_removed_count != 1 else ''} skipped from checkpoint": previous_checkpoint_functions_removed_count,
     }
    Suggestion importance[1-10]: 4

    __

    Why: Renaming the blocklist entry to “blocklisted functions” clarifies log messages without altering functionality.

    Low

    @misrasaurabh1 misrasaurabh1 merged commit 1d2bbf7 into main Apr 30, 2025
    15 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants