Skip to content

perf: reduce GitHub API calls for GHES deployments#2298

Draft
ebellefontaine wants to merge 1 commit intoqodo-ai:mainfrom
ebellefontaine:optimize/reduce-ghes-api-calls
Draft

perf: reduce GitHub API calls for GHES deployments#2298
ebellefontaine wants to merge 1 commit intoqodo-ai:mainfrom
ebellefontaine:optimize/reduce-ghes-api-calls

Conversation

@ebellefontaine
Copy link
Copy Markdown

Problem

When running pr-agent against a self-hosted GitHub Enterprise Server (GHES) appliance, every API call translates directly into HTTP handler work on the appliance's CPU and memory. A single /review invocation currently makes 14 + 2N API calls (where N is the number of changed files, capped at 50). For a 20-file PR that's 54 calls; for a 50-file PR it's 114. At scale across a team, this adds meaningful load to the appliance -- especially during busy review periods.

GHES rate limiting is typically disabled by default, so the problem doesn't surface as 429 errors. It shows up as degraded appliance performance under concurrent pr-agent activity.

Approach

Two categories of changes:

Unconditional caching (always active):

  • Cache the commits list fetched during __init__ so get_commit_messages() reuses it instead of fetching again
  • Cache get_languages() results so repeated calls within a review cycle don't re-fetch

Conditional optimizations (behind github.minimize_api_calls = true, default off):

  • Skip per-file content fetches during /review when the PR files endpoint already provides the patch. This is the big win -- eliminates up to 2N calls
  • Suppress the temporary "Preparing review..." comment (saves a POST + DELETE)
  • Use labels already on the PR object for deduplication instead of a separate GET call

The conditional changes only affect /review. The /improve and /describe tools always fetch full file content because they need it for indentation analysis and deduplication guards.

Results

Scenario Before After (flag on)
20-file PR 54 calls 9 calls
50-file PR 114 calls 9 calls

The remaining 9 calls are the irreducible minimum: repo, PR, commits, config, languages, files, compare, labels (conditional), and the review comment itself.

What's guarded and why

  • handle_patch_deletions() uses edit_type == DELETED instead of inferring deletion from empty content when the flag is on, since empty content now means "intentionally skipped" rather than "file was deleted"
  • Incremental review mode always fetches content (needs it for load_large_diff)
  • Files with missing patches (GitHub truncates large diffs) still get content fetched
  • disable_extra_lines is passed through to get_pr_diff() when minimizing, using the existing parameter rather than mutating the settings singleton

Config

[github]
minimize_api_calls = true  # default: false

Test plan

  • 9 new unit tests covering all phases (commit caching, language caching, deletion guard, temp comment suppression, config default)
  • Full test suite passes (296 tests)
  • Manual testing: run /review on a test PR with flag on/off and compare output
  • Manual testing: run /improve and /describe with flag on to confirm they still fetch content
  • Edge cases: PR with 0 files, PR with >50 files, incremental review mode

Cache commits list and language stats to avoid duplicate fetches (Phases 1-2).
Behind new `github.minimize_api_calls` flag (default off):
- Skip file content fetches for /review when patches exist (Phase 4)
- Suppress temporary "Preparing review..." comment (Phase 5)
- Use PR object labels instead of extra GET call (Phase 6)

Reduces API calls from 14+2N to 9 (constant) for /review.
No impact on /improve or /describe — they always fetch full content.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Reduce GitHub API calls for GHES deployments via caching and conditional skipping

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Cache commits list and language stats to eliminate duplicate API fetches
• Skip file content fetches for /review when patches exist (behind flag)
• Suppress temporary "Preparing review..." comment when minimizing API calls
• Use PR object labels instead of extra GET call for deduplication
• Add comprehensive unit tests covering all optimization phases
Diagram
flowchart LR
  A["PR Review Request"] --> B["Check minimize_api_calls flag"]
  B -->|False| C["Standard flow: fetch all content"]
  B -->|True| D["Optimized flow"]
  D --> E["Use cached commits"]
  D --> F["Use cached languages"]
  D --> G["Skip file content fetches"]
  D --> H["Skip temp comment"]
  D --> I["Use PR object labels"]
  C --> J["14 + 2N API calls"]
  E --> K["9 constant API calls"]
  F --> K
  G --> K
  H --> K
  I --> K
Loading

Grey Divider

File Changes

1. pr_agent/git_providers/github_provider.py ✨ Enhancement +280/-218

Implement caching and conditional API call skipping

• Added _languages instance variable to cache language stats after first fetch
• Modified get_languages() to return cached result on subsequent calls
• Modified get_commit_messages() to use self.pr_commits cache instead of fetching again
• Added minimize_api_calls flag check in get_diff_files() to skip file content fetches when
 patches exist
• Updated handle_patch_deletions() call to use edit_type for deletion detection when minimizing
• Modified set_review_labels() to use cached PR labels when minimize_api_calls is enabled
• Extensive code formatting improvements (import reorganization, line wrapping, quote normalization)

pr_agent/git_providers/github_provider.py


2. pr_agent/algo/git_patch_processing.py ✨ Enhancement +145/-105

Guard deletion detection for minimize_api_calls mode

• Updated handle_patch_deletions() to check edit_type == DELETED explicitly when
 minimize_api_calls is active
• Added guard logic: empty content only means deleted when flag is off or edit_type explicitly
 indicates deletion
• Extensive code formatting improvements (function signature wrapping, string quote normalization)

pr_agent/algo/git_patch_processing.py


3. pr_agent/tools/pr_reviewer.py ✨ Enhancement +148/-95

Suppress temp comment and pass minimize flag to diff processing

• Added minimize_api_calls flag check to suppress temporary "Preparing review..." comment
• Modified _prepare_prediction() to pass disable_extra_lines=minimize to get_pr_diff()
• Updated set_review_labels() to use update=False when minimizing to avoid extra GET call
• Extensive code formatting improvements (import reorganization, line wrapping, conditional
 restructuring)

pr_agent/tools/pr_reviewer.py


View more (2)
4. tests/unittest/test_minimize_api_calls.py 🧪 Tests +251/-0

Add comprehensive unit tests for all optimization phases

• Added 9 new unit tests covering Phase 1 (commit caching)
• Added tests for Phase 2 (language caching)
• Added tests for Phase 4 (handle_patch_deletions guard with minimize flag)
• Added tests for Phase 5 (temporary comment suppression)
• Added test for config default value verification
• Tests verify both enabled and disabled states of the optimization flag

tests/unittest/test_minimize_api_calls.py


5. pr_agent/settings/configuration.toml ⚙️ Configuration changes +4/-0

Add minimize_api_calls configuration option with default false

• Added new minimize_api_calls configuration option under [github] section
• Set default value to false to maintain backward compatibility
• Added documentation comment explaining the flag's purpose and scope

pr_agent/settings/configuration.toml


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Mar 31, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Unused pytest import 📘 Rule violation ⚙ Maintainability
Description
pytest is imported but never used in the new unit test file, which is expected to fail Ruff/CI
lint checks. This introduces a tooling compliance issue in newly added code.
Code

tests/unittest/test_minimize_api_calls.py[15]

+import pytest
Evidence
The compliance checklist requires changed code to comply with linting tooling; the added test file
imports pytest but does not reference it anywhere, making it an unused import.

tests/unittest/test_minimize_api_calls.py[15-15]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`pytest` is imported but not used, which should fail Ruff (unused import).

## Issue Context
This import was introduced in the new test module `tests/unittest/test_minimize_api_calls.py`.

## Fix Focus Areas
- tests/unittest/test_minimize_api_calls.py[15-15]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Pointless f-string in log 📘 Rule violation ⚙ Maintainability
Description
A log message is written as an f-string without any interpolated values, which is a Ruff violation
(F541) and can fail linting. This was introduced/modified in the PR and should be converted to a
normal string.
Code

pr_agent/git_providers/github_provider.py[R273-276]

+                    get_logger().info(
+                        f"Filtered out [ignore] files for pull request:",
+                        extra={"files": names_original, "filtered_files": names_new},
+                    )
Evidence
The checklist requires Ruff compliance; the changed logging call uses f"..." without placeholders,
which Ruff flags as an f-string without interpolation.

pr_agent/git_providers/github_provider.py[273-276]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A logging call uses an f-string with no `{}` placeholders, which triggers Ruff `F541`.

## Issue Context
This appears in the `get_diff_files()` logging path where an `extra=` dict is provided.

## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[273-276]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Minimize breaks non-review tools 🐞 Bug ≡ Correctness
Description
GithubProvider.get_diff_files() now skips fetching base/head file content whenever
github.minimize_api_calls is enabled and a patch exists, regardless of which tool is running.
Tools like /improve (PRCodeSuggestions) and /add_docs rely on head_file/base_file for
indentation and deduplication, so with the flag enabled they can produce mis-indented suggestions
and bypass duplicate-detection guards.
Code

pr_agent/git_providers/github_provider.py[R299-306]

+            # When minimize_api_calls is enabled and not in incremental mode,
+            # skip file content fetches for files that already have a patch from
+            # the PR files endpoint. This eliminates up to 2N API calls (N = files).
+            minimize = get_settings().get("github.minimize_api_calls", False)
+            skip_content = minimize and not self.incremental.is_incremental

            counter_valid = 0
            for file in files:
Evidence
The new skip_content branch unconditionally sets new_file_content_str and
original_file_content_str to empty strings when github.minimize_api_calls is true and
file.patch is present. Downstream tools depend on FilePatchInfo.head_file/base_file being
populated: PRCodeSuggestions uses head_file to dedent suggestions and uses both base_file and
head_file to detect repeated/duplicate suggestions; PRAddDocs indexes into
head_file.splitlines() to determine indentation/placement. PRCodeSuggestions explicitly calls
get_pr_diff(..., disable_extra_lines=False), so it still expects real file content (not empty
strings).

pr_agent/git_providers/github_provider.py[299-321]
pr_agent/tools/pr_code_suggestions.py[367-373]
pr_agent/tools/pr_code_suggestions.py[581-639]
pr_agent/tools/pr_add_docs.py[136-150]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`github.minimize_api_calls` currently changes `GithubProvider.get_diff_files()` behavior globally (for all tools), causing `FilePatchInfo.head_file`/`base_file` to be empty for many files. This breaks or degrades tools that require full file content (e.g., `/improve` dedenting + duplication guards, `/add_docs` indentation/placement).

### Issue Context
The optimization should only apply to `/review` (where `get_pr_diff(..., disable_extra_lines=True)` is used), but `get_diff_files()` is also used by other tools that pass `disable_extra_lines=False` and expect populated file contents.

### Fix Focus Areas
- pr_agent/git_providers/github_provider.py[299-353]
- pr_agent/tools/pr_reviewer.py[165-222]
- pr_agent/tools/pr_code_suggestions.py[367-373]

### What to change (one acceptable approach)
- Remove the global `skip_content = minimize ...` behavior from `GithubProvider.get_diff_files()`.
- Instead, make skipping content an explicit, review-only mode, e.g.:
 - Add a provider attribute (e.g., `self._skip_content_fetch = False`) and only set it to `True` inside `PRReviewer` when running `/review`.
 - Or set a request-scoped flag in `starlette_context.context` in `PRReviewer` before calling `get_pr_diff()`, and have `get_diff_files()` consult *that* (not the global config) to decide.
- Ensure tools that need content (`/improve`, `/add_docs`, etc.) never run with content fetching disabled even if the global config flag is true.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo


from unittest.mock import MagicMock, patch

import pytest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Unused pytest import 📘 Rule violation ⚙ Maintainability

pytest is imported but never used in the new unit test file, which is expected to fail Ruff/CI
lint checks. This introduces a tooling compliance issue in newly added code.
Agent Prompt
## Issue description
`pytest` is imported but not used, which should fail Ruff (unused import).

## Issue Context
This import was introduced in the new test module `tests/unittest/test_minimize_api_calls.py`.

## Fix Focus Areas
- tests/unittest/test_minimize_api_calls.py[15-15]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +273 to +276
get_logger().info(
f"Filtered out [ignore] files for pull request:",
extra={"files": names_original, "filtered_files": names_new},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Pointless f-string in log 📘 Rule violation ⚙ Maintainability

A log message is written as an f-string without any interpolated values, which is a Ruff violation
(F541) and can fail linting. This was introduced/modified in the PR and should be converted to a
normal string.
Agent Prompt
## Issue description
A logging call uses an f-string with no `{}` placeholders, which triggers Ruff `F541`.

## Issue Context
This appears in the `get_diff_files()` logging path where an `extra=` dict is provided.

## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[273-276]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +299 to 306
# When minimize_api_calls is enabled and not in incremental mode,
# skip file content fetches for files that already have a patch from
# the PR files endpoint. This eliminates up to 2N API calls (N = files).
minimize = get_settings().get("github.minimize_api_calls", False)
skip_content = minimize and not self.incremental.is_incremental

counter_valid = 0
for file in files:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Minimize breaks non-review tools 🐞 Bug ≡ Correctness

GithubProvider.get_diff_files() now skips fetching base/head file content whenever
github.minimize_api_calls is enabled and a patch exists, regardless of which tool is running.
Tools like /improve (PRCodeSuggestions) and /add_docs rely on head_file/base_file for
indentation and deduplication, so with the flag enabled they can produce mis-indented suggestions
and bypass duplicate-detection guards.
Agent Prompt
### Issue description
`github.minimize_api_calls` currently changes `GithubProvider.get_diff_files()` behavior globally (for all tools), causing `FilePatchInfo.head_file`/`base_file` to be empty for many files. This breaks or degrades tools that require full file content (e.g., `/improve` dedenting + duplication guards, `/add_docs` indentation/placement).

### Issue Context
The optimization should only apply to `/review` (where `get_pr_diff(..., disable_extra_lines=True)` is used), but `get_diff_files()` is also used by other tools that pass `disable_extra_lines=False` and expect populated file contents.

### Fix Focus Areas
- pr_agent/git_providers/github_provider.py[299-353]
- pr_agent/tools/pr_reviewer.py[165-222]
- pr_agent/tools/pr_code_suggestions.py[367-373]

### What to change (one acceptable approach)
- Remove the global `skip_content = minimize ...` behavior from `GithubProvider.get_diff_files()`.
- Instead, make skipping content an explicit, review-only mode, e.g.:
  - Add a provider attribute (e.g., `self._skip_content_fetch = False`) and only set it to `True` inside `PRReviewer` when running `/review`.
  - Or set a request-scoped flag in `starlette_context.context` in `PRReviewer` before calling `get_pr_diff()`, and have `get_diff_files()` consult *that* (not the global config) to decide.
- Ensure tools that need content (`/improve`, `/add_docs`, etc.) never run with content fetching disabled even if the global config flag is true.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@ebellefontaine ebellefontaine marked this pull request as draft March 31, 2026 17:37
@ebellefontaine
Copy link
Copy Markdown
Author

Moving to draft until I resolve the above issues reported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant