Skip to content

Commit 125749d

Browse files
Document pagination performance issues from PR #33 (#36)
Investigation reveals three issues causing slowdown: 1. Cache invalidation (one-time): URL changes invalidated ETags 2. Extra API calls for exactly 100 items: Main issue causing 23-50% slowdown for affected users 3. No multi-page caching: Minor cumulative impact The main culprit is issue #2: current logic makes an extra API call when result sets have exactly 100 items because it can't distinguish between 'exactly 100 total' and '100 with more pages'. Solution: Use GitHub's Link header to detect pagination instead of guessing based on result count. Files: - PAGINATION_ANALYSIS.md: Complete technical analysis with impact tables - PAGINATION_FIX_POC.md: Implementation guide showing how to fix issue #2 For users following 100 people where 30% have exactly 100 items: - Current: ~391 API calls - After fix: ~301 API calls (23% reduction)
1 parent c69dc82 commit 125749d

File tree

2 files changed

+486
-0
lines changed

2 files changed

+486
-0
lines changed

PAGINATION_ANALYSIS.md

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Pagination Performance Analysis for PR #33
2+
3+
## Summary
4+
5+
After investigating the pagination changes in PR #33, I've identified three performance issues that cause syncs to be slower, especially noticeable in certain scenarios.
6+
7+
## Issues Identified
8+
9+
### 1. Cache Invalidation (First Sync After Update) ⚠️
10+
11+
**Impact**: Severe slowdown on first sync after updating
12+
13+
**Root Cause**: The pagination implementation changed all API endpoint URLs by adding query parameters:
14+
- **Before**: `/user/following`, `/users/bob/starred`
15+
- **After**: `/user/following?page=1&per_page=100`, `/users/bob/starred?page=1&per_page=100`
16+
17+
**Problem**: The GitHub client uses path-based ETag caching (github/client.go:168-173). When paths change, all cached ETags become useless. This means:
18+
- The first sync after updating to PR #33 hits GitHub's API for **every endpoint** instead of returning 304 Not Modified
19+
- No benefit from conditional requests on first sync
20+
- For users following 100 people, this is 1 + (3 × 100) = **301 API calls** instead of potentially just a few
21+
22+
**Evidence**: See cache lookup in github/client.go:168-173
23+
24+
### 2. Extra API Call for Exactly 100 Items ⚠️
25+
26+
**Impact**: Moderate slowdown for users with exactly 100 items in any category
27+
28+
**Root Cause**: The pagination logic in `getPaginated()` (github/client.go:301-333) continues fetching when it gets exactly `perPage` items:
29+
30+
```go
31+
// If we got fewer results than per_page, this is the last page
32+
if pageSlice.Len() < perPage {
33+
break
34+
}
35+
```
36+
37+
**Problem**: When a result set has exactly 100 items (or any multiple of 100), the code:
38+
1. Fetches page 1 → gets 100 items
39+
2. Sees 100 == 100 (perPage), so continues
40+
3. Fetches page 2 → gets 0 items
41+
4. Only then stops
42+
43+
**Example Scenario**: A user who follows exactly 100 people:
44+
- Expected: 1 API call
45+
- Actual: 2 API calls (100% increase)
46+
47+
**Per-User Impact**: For each followed user, we call 3 methods:
48+
- `GetStarredReposByUsername`
49+
- `GetOwnedReposByUsername`
50+
- `GetRecentEvents`
51+
52+
If any of these returns exactly 100 items, that's an extra API call. For 100 followed users, this could add up to 300 extra API calls in the worst case.
53+
54+
**Evidence**: Test in github/client_test.go:637-690 explicitly expects 2 requests for exactly 100 items (see comment on line 687)
55+
56+
### 3. No Incremental Caching for Multi-Page Results
57+
58+
**Impact**: Minor but cumulative
59+
60+
**Root Cause**: GitHub only returns Link headers for pagination, not stable ETags across page boundaries.
61+
62+
**Problem**: Users with > 100 items in any category can't benefit from conditional requests for pages beyond the first. Each page is a new URL that's never been cached.
63+
64+
**Example**: A user with 250 followed users:
65+
- Page 1: `/user/following?page=1&per_page=100` (can be cached)
66+
- Page 2: `/user/following?page=2&per_page=100` (new path, no cache benefit)
67+
- Page 3: `/user/following?page=3&per_page=100` (new path, no cache benefit)
68+
69+
## Real-World Impact
70+
71+
### Single User Scenario (following 5 people)
72+
73+
**Before PR #33**:
74+
- Fetch followed users: 1 API call
75+
- For each of 5 users: 3 API calls
76+
- Total: 16 API calls
77+
- With ETag caching on subsequent syncs: potentially just 16 × 304 responses
78+
79+
**After PR #33** (first sync):
80+
- Fetch followed users: 1 API call (cache miss due to path change)
81+
- For each of 5 users: 3 API calls (all cache misses)
82+
- Total: 16 API calls **with no cache hits** → full data transfers
83+
84+
**After PR #33** (subsequent syncs):
85+
- Same as before (16 calls), but paths are now cached again
86+
- Performance returns to normal after first sync
87+
88+
### User with Exactly 100 Followed Users
89+
90+
**Before PR #33**:
91+
- Total API calls: ~301 (1 + 3×100)
92+
93+
**After PR #33**:
94+
- Followed users endpoint: 2 calls (100 users, then check page 2)
95+
- Per user: potentially 2 calls if starred repos = 100, 2 if owned repos = 100, 2 if events = 100
96+
- Worst case: 2 + (6 × 100) = **602 API calls** (100% increase!)
97+
- Typical case: 2 + (4 × 100) = **402 API calls** (33% increase)
98+
99+
## Proposed Fixes
100+
101+
### Fix #1: Preserve Cache Keys (Migration Strategy)
102+
103+
**Option A - Maintain Backward Compatibility**:
104+
- Store cache entries for both old and new path formats during transition
105+
- Check both `/user/following` and `/user/following?page=1&per_page=100` caches
106+
- Gradually migrate over time
107+
108+
**Option B - Accept One-Time Cache Miss**:
109+
- Document that first sync after update will be slower
110+
- This is a one-time cost and resolves itself automatically
111+
112+
**Recommendation**: Option B with documentation. The cache miss is a one-time event and fixing it adds complexity.
113+
114+
### Fix #2: Optimize Exactly-100 Detection ⭐ PRIORITY
115+
116+
**Current Logic**:
117+
```go
118+
if pageSlice.Len() < perPage {
119+
break
120+
}
121+
```
122+
123+
**Proposed Fix**: Use GitHub's Link header to detect if there's a next page, instead of assuming there is when we get exactly `perPage` items.
124+
125+
**Alternative Fix** (simpler): Special case handling for exactly perPage:
126+
```go
127+
// If we got fewer than per_page, this is the last page
128+
if pageSlice.Len() < perPage {
129+
break
130+
}
131+
132+
// If we got exactly per_page items, peek at the next page
133+
// but only if we haven't made too many requests
134+
if pageSlice.Len() == perPage {
135+
// Could check Link header here instead of making another request
136+
// GitHub provides: Link: <url?page=2>; rel="next"
137+
}
138+
```
139+
140+
**Impact**: Reduces API calls by up to 50% for users with exactly 100 items in any category.
141+
142+
### Fix #3: Implement Link Header Pagination ⭐ BEST SOLUTION
143+
144+
**Proper Approach**: Parse GitHub's `Link` header to determine if there are more pages:
145+
146+
```go
147+
// Check for Link header with rel="next"
148+
linkHeader := resp.Header.Get("Link")
149+
hasNext := strings.Contains(linkHeader, `rel="next"`)
150+
151+
if !hasNext || pageSlice.Len() == 0 {
152+
break
153+
}
154+
```
155+
156+
**Benefits**:
157+
- Eliminates extra API calls for exactly 100 items
158+
- More reliable than guessing based on result count
159+
- Standard GitHub API practice
160+
- No assumptions about page size
161+
162+
**Impact**: Fixes issue #2 completely, reducing unnecessary API calls by up to 300 for users following many people.
163+
164+
## Recommendation
165+
166+
1. **Immediate**: Implement Link header pagination (Fix #3) - this is the proper solution and fixes issue #2 completely
167+
2. **Document**: Note that first sync after update will be slower due to cache invalidation (issue #1)
168+
3. **Consider**: Future optimization for multi-page caching (issue #3) only if it becomes a bottleneck
169+
170+
## Test Case to Add
171+
172+
```go
173+
func TestGetPaginatedNoExtraCallForExactly100Items(t *testing.T) {
174+
// Test that we don't make an extra API call when exactly 100 items exist
175+
// Should use Link header to detect end of pagination
176+
repos := make([]Repository, 100)
177+
for i := 0; i < 100; i++ {
178+
repos[i] = Repository{ID: int64(i), Name: fmt.Sprintf("repo%d", i)}
179+
}
180+
181+
requestCount := 0
182+
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
183+
requestCount++
184+
page := r.URL.Query().Get("page")
185+
186+
if page == "1" {
187+
w.Header().Set("Content-Type", "application/json")
188+
// No Link header with rel="next" means this is the last page
189+
json.NewEncoder(w).Encode(repos)
190+
} else {
191+
t.Errorf("should not request page %s when page 1 has no next link", page)
192+
}
193+
}))
194+
defer server.Close()
195+
196+
c := NewClient("token", WithBaseURL(server.URL))
197+
result, err := c.GetStarredRepos(context.Background())
198+
199+
require.NoError(t, err)
200+
assert.Equal(t, 100, len(result))
201+
assert.Equal(t, 1, requestCount, "should only make 1 request when Link header indicates no next page")
202+
}
203+
```
204+
205+
## Files to Modify
206+
207+
1. **github/client.go**: Update `getPaginated()` to check Link header (lines 286-338)
208+
2. **github/client.go**: Update `get()` method to return Link header or parse it
209+
3. **github/client_test.go**: Update test expectations (line 687-689)
210+
4. **github/client_test.go**: Add new test for Link header pagination
211+
212+
## API Call Comparison
213+
214+
| Scenario | Before PR #33 | After PR #33 (current) | After Fix |
215+
|----------|---------------|------------------------|-----------|
216+
| 5 followed users (first sync) | 16 | 16 | 16 |
217+
| 5 followed users (cached) | 16 (mostly 304s) | 16 (mostly 304s after first) | 16 (mostly 304s) |
218+
| 100 followed users, all with ~50 items each | 301 | 301 | 301 |
219+
| 100 followed users, all with exactly 100 items | 301 | 602 | 301 |
220+
| **100 followed users, 50 with exactly 100 items** | **301** | **451** | **301** |
221+
222+
**Worst case impact**: Currently adding **150 extra API calls** (50% increase) for users in the exactly-100-items scenario.

0 commit comments

Comments
 (0)