Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Jun 6, 2025

PR Type

Enhancement, Tests


Description

  • Parallelize unit test discovery using processes

  • Introduce caching to skip processed files

  • Return total discovered count from discovery

  • Update call sites and tests for new API


Changes walkthrough 📝

Relevant files
Enhancement
discover_unit_tests.py
Add parallel test discovery and caching                                   

codeflash/discovery/discover_unit_tests.py

  • Imported multiprocessing and ProcessPoolExecutor
  • Added _process_single_test_file helper function
  • Parallelized processing with futures executor
  • Implemented caching skip logic and result counting
  • +226/-173
    Bug_fix
    functions_to_optimize.py
    Unpack discover_unit_tests return tuple                                   

    codeflash/discovery/functions_to_optimize.py

  • Unpacked discover_unit_tests tuple return
  • Adjusted function_tests assignment
  • +1/-1     
    optimizer.py
    Unpack discovery results in optimizer                                       

    codeflash/optimization/optimizer.py

  • Unpacked discover_unit_tests tuple return
  • Updated num_discovered_tests assignment
  • +1/-2     
    concolic_testing.py
    Support new discovery return in concolic testing                 

    codeflash/verification/concolic_testing.py

  • Unpacked discover_unit_tests return tuple
  • Updated logging of discovered test count
  • +1/-2     
    Tests
    test_unit_test_discovery.py
    Update tests for new discover_unit_tests signature             

    tests/test_unit_test_discovery.py

  • Updated tests to unpack tuple return
  • Ensured tests consume new signature
  • +17/-17 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented Jun 6, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Commit

    The SQLite connection does not call commit() after inserting cache entries, so cached test data may never be persisted to disk.

    for test_file, functions in uncached_files.items():
        _, results, count, cache_entries = _process_single_test_file(
            test_file, functions, project_root_path, test_framework
        )
        total_count += count
    
        file_hash = TestsCache.compute_file_hash(str(test_file))
        for cache_entry in cache_entries:
            tests_cache.insert_test(file_path=str(test_file), file_hash=file_hash, **cache_entry)
    
        for qualified_name, function_called in results:
    Cache Staleness

    The insert_test method no longer deletes old entries for a file, which can lead to duplicate or stale records when tests are re-discovered. Consider clearing or replacing entries by file_path and file_hash.

    def insert_test(
        self,
        file_path: str,
        file_hash: str,
        qualified_name_with_modules_from_root: str,
        function_name: str,
        test_class: str,
        test_function: str,
        test_type: TestType,
        line_number: int,
        col_number: int,
    ) -> None:
        test_type_value = test_type.value if hasattr(test_type, "value") else test_type
        self.cur.execute(
            "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
            (

    @github-actions
    Copy link

    github-actions bot commented Jun 6, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Persist cache updates

    Add a DELETE statement before inserting to remove stale cache entries for the same
    file and hash, and call self.connection.commit() after the insert so the database
    updates actually persist.

    codeflash/discovery/discover_unit_tests.py [80-96]

     def insert_test(
         self,
         file_path: str,
         file_hash: str,
         qualified_name_with_modules_from_root: str,
         function_name: str,
         test_class: str,
         test_function: str,
         test_type: TestType,
         line_number: int,
         col_number: int,
     ) -> None:
    +    # purge old entries for this file and hash
    +    self.cur.execute(
    +        "DELETE FROM discovered_tests WHERE file_path = ? AND file_hash = ?",
    +        (file_path, file_hash),
    +    )
         test_type_value = test_type.value if hasattr(test_type, "value") else test_type
         self.cur.execute(
             "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
             (
                 file_path,
                 file_hash,
                 qualified_name_with_modules_from_root,
                 function_name,
                 test_class,
                 test_function,
                 test_type_value,
                 line_number,
                 col_number,
             ),
         )
    +    self.connection.commit()
    Suggestion importance[1-10]: 7

    __

    Why: Adding a DELETE before the insert prevents stale rows accumulating, and calling self.connection.commit() ensures the new entries are persisted immediately.

    Medium
    General
    Limit worker pool correctly

    Bound the worker pool size to the number of uncached files rather than the total
    file map so you don’t spawn idle processes when most files are already cached.

    codeflash/discovery/discover_unit_tests.py [458]

    -max_workers = min(len(file_to_test_map) or 1, multiprocessing.cpu_count())
    +# after segregating uncached_files
    +max_workers = min(len(uncached_files) or 1, multiprocessing.cpu_count())
    Suggestion importance[1-10]: 6

    __

    Why: Bounding max_workers to len(uncached_files) avoids spawning idle processes when many files are already cached, improving resource usage.

    Low
    Use serializable worker results

    Serializing custom FunctionCalledInTest objects across processes can fail; convert
    each result into plain dictionaries or tuples before returning and rebuild the
    objects in the main process.

    codeflash/discovery/discover_unit_tests.py [445]

    -return str(test_file), results, len(results), cache_entries
    +# at end of _process_single_test_file
    +serial_results = [
    +    (
    +        qualified_name,
    +        {
    +            "test_file": str(fc.tests_in_file.test_file),
    +            "test_class": fc.tests_in_file.test_class,
    +            "test_function": fc.tests_in_file.test_function,
    +            "test_type": fc.tests_in_file.test_type.value if hasattr(fc.tests_in_file.test_type, "value") else fc.tests_in_file.test_type,
    +            "line": fc.position.line_no,
    +            "col": fc.position.col_no,
    +        },
    +    )
    +    for qualified_name, fc in results
    +]
    +return str(test_file), serial_results, len(results), cache_entries
    Suggestion importance[1-10]: 5

    __

    Why: Converting FunctionCalledInTest objects to primitive types avoids pickling errors in ProcessPoolExecutor, enhancing robustness when deserializing results.

    Low

    @KRRT7 KRRT7 closed this Jun 6, 2025
    @KRRT7 KRRT7 force-pushed the parallelize-test-discovery branch from 75acf5e to 9be93b6 Compare June 6, 2025 19:51
    @KRRT7 KRRT7 deleted the parallelize-test-discovery branch June 6, 2025 19:51
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants