|
| 1 | +# [Problem 3321: Find X-Sum of All K-Long Subarrays II](https://leetcode.com/problems/find-x-sum-of-all-k-long-subarrays-ii/description/?envType=daily-question) |
| 2 | + |
| 3 | +## Initial thoughts (stream-of-consciousness) |
| 4 | +I need the x-sum of every k-length sliding window. The x-sum keeps the occurrences of the top x most frequent distinct elements (tie-break by larger value) and sums the resulting array (i.e., sum value * frequency for each chosen distinct value). A sliding window suggests maintaining counts incrementally, and deciding which distinct values belong to the top-x set as the window moves. |
| 5 | + |
| 6 | +We must frequently compare elements by (frequency, value) ordering and maintain the top-x set. This is similar to maintaining two multisets/heaps (top and bottom) like the two-heap approach for sliding-window median. Python doesn't have a balanced tree, but we can use two heaps with lazy deletion and a freq map + an "in_top" marker. Maintain: |
| 7 | +- top heap: the chosen top-x distinct values (we must be able to access the worst among them quickly) |
| 8 | +- bottom heap: the remaining distinct values (we must be able to access the best among them quickly) |
| 9 | +Also keep sum_top = sum(freq[val] * val for val in top-set). Rebalance after each add/remove to ensure top contains exactly t = min(x, distinct_count) best elements. |
| 10 | + |
| 11 | +Lazy deletion: when an element's frequency changes we push a new tuple to its current heap and mark its current in_top. When popping, we skip stale entries by checking the current freq and in_top status. |
| 12 | + |
| 13 | +This should give O(n log m) time where m is number of distinct values (<= n). |
| 14 | + |
| 15 | +## Refining the problem, round 2 thoughts |
| 16 | +Edge cases: |
| 17 | +- k == x: x-sum equals full window sum (algorithm still handles this since top will include all distinct or x distinct). |
| 18 | +- frequencies hitting zero: must remove distinct and update t and top_size. If an element is in top and gets removed (freq 0), we must update sum_top and top_size immediately (can't wait for lazy pops). |
| 19 | +- tie-breaker: when freq equal, larger value is considered more frequent. So ordering is by (freq, value) descending. For comparisons, I'll treat pair (freq, value) and compare lexicographically. |
| 20 | + |
| 21 | +Implementation details: |
| 22 | +- top heap should let us access worst among top (min by (freq, value)), so top_heap uses (freq, value, val) as min-heap. |
| 23 | +- bottom heap should let us access best among bottom (max by (freq, value)), so bottom_heap stores (-freq, -value, val). |
| 24 | +- Maintain freq dict and in_top dict (bool), top_size int, D distinct count, sum_top integer. |
| 25 | +- Provide helper functions to get/pop valid entries from heaps and to rebalance. |
| 26 | + |
| 27 | +Complexities: |
| 28 | +- Each add/remove pushes at most one new entry and each rebalance step moves/pops elements; each pop processes at least one stale entry or valid element, so overall number of heap operations is O(n log m). Space O(m). |
| 29 | + |
| 30 | +## Attempted solution(s) |
| 31 | +```python |
| 32 | +import heapq |
| 33 | +from typing import List |
| 34 | + |
| 35 | +class Solution: |
| 36 | + def xSum(self, nums: List[int], k: int, x: int) -> List[int]: |
| 37 | + n = len(nums) |
| 38 | + # Heaps and data structures: |
| 39 | + # top_heap: min-heap of (freq, value, val) -> worst among top is top_heap[0] |
| 40 | + # bottom_heap: min-heap used as max-heap storing (-freq, -value, val) -> best among bottom is bottom_heap[0] |
| 41 | + top_heap = [] |
| 42 | + bottom_heap = [] |
| 43 | + freq = {} # current freq of value in window |
| 44 | + in_top = {} # whether value is considered in top set |
| 45 | + D = 0 # number of distinct elements with freq > 0 |
| 46 | + top_size = 0 # number of distinct elements currently in top set |
| 47 | + sum_top = 0 # sum of freq[val] * val over elements in top set |
| 48 | + |
| 49 | + def push_top(v): |
| 50 | + # push current tuple for v into top heap. |
| 51 | + f = freq.get(v, 0) |
| 52 | + if f <= 0: |
| 53 | + return |
| 54 | + heapq.heappush(top_heap, (f, v, v)) |
| 55 | + in_top[v] = True |
| 56 | + |
| 57 | + def push_bottom(v): |
| 58 | + # push current tuple for v into bottom heap. |
| 59 | + f = freq.get(v, 0) |
| 60 | + if f <= 0: |
| 61 | + return |
| 62 | + heapq.heappush(bottom_heap, (-f, -v, v)) |
| 63 | + in_top[v] = False |
| 64 | + |
| 65 | + def pop_valid_top(): |
| 66 | + # pop until a valid top entry is found and return (f, v) |
| 67 | + while top_heap: |
| 68 | + f, val_key, v = heapq.heappop(top_heap) |
| 69 | + # valid if current freq matches and in_top is True |
| 70 | + if freq.get(v, 0) == f and in_top.get(v, False) == True: |
| 71 | + return f, v |
| 72 | + # else stale, continue popping |
| 73 | + return None |
| 74 | + |
| 75 | + def peek_valid_top(): |
| 76 | + while top_heap: |
| 77 | + f, val_key, v = top_heap[0] |
| 78 | + if freq.get(v, 0) == f and in_top.get(v, False) == True: |
| 79 | + return f, v |
| 80 | + heapq.heappop(top_heap) # drop stale |
| 81 | + return None |
| 82 | + |
| 83 | + def pop_valid_bottom(): |
| 84 | + while bottom_heap: |
| 85 | + nf, nval, v = heapq.heappop(bottom_heap) |
| 86 | + f = -nf |
| 87 | + # valid if current freq matches and in_top is False |
| 88 | + if freq.get(v, 0) == f and in_top.get(v, True) == False: |
| 89 | + return f, v |
| 90 | + # else stale, continue popping |
| 91 | + return None |
| 92 | + |
| 93 | + def peek_valid_bottom(): |
| 94 | + while bottom_heap: |
| 95 | + nf, nval, v = bottom_heap[0] |
| 96 | + f = -nf |
| 97 | + if freq.get(v, 0) == f and in_top.get(v, True) == False: |
| 98 | + return f, v |
| 99 | + heapq.heappop(bottom_heap) |
| 100 | + return None |
| 101 | + |
| 102 | + def rebalance(): |
| 103 | + nonlocal top_size, sum_top |
| 104 | + tgt = min(x, D) |
| 105 | + # 1) Grow top if needed |
| 106 | + while top_size < tgt: |
| 107 | + # move best from bottom to top |
| 108 | + item = pop_valid_bottom() |
| 109 | + if not item: |
| 110 | + break |
| 111 | + f, v = item |
| 112 | + # move v to top |
| 113 | + in_top[v] = True |
| 114 | + heapq.heappush(top_heap, (f, v, v)) |
| 115 | + top_size += 1 |
| 116 | + sum_top += f * v |
| 117 | + # 2) Shrink top if needed (some elements died or D decreased) |
| 118 | + while top_size > tgt: |
| 119 | + item = pop_valid_top() |
| 120 | + if not item: |
| 121 | + break |
| 122 | + f, v = item |
| 123 | + # move v to bottom (or remove if freq==0) |
| 124 | + in_top[v] = False |
| 125 | + top_size -= 1 |
| 126 | + sum_top -= f * v |
| 127 | + if freq.get(v, 0) > 0: |
| 128 | + heapq.heappush(bottom_heap, (-freq[v], -v, v)) |
| 129 | + # 3) Ensure ordering invariant: every top element >= every bottom element |
| 130 | + while True: |
| 131 | + top_peek = peek_valid_top() |
| 132 | + bottom_peek = peek_valid_bottom() |
| 133 | + if not top_peek or not bottom_peek: |
| 134 | + break |
| 135 | + ft, vt = top_peek |
| 136 | + fb, vb = bottom_peek |
| 137 | + # if bottom's best is better than top's worst -> swap |
| 138 | + # compare (fb, vb) > (ft, vt) |
| 139 | + if fb > ft or (fb == ft and vb > vt): |
| 140 | + # pop both and swap membership |
| 141 | + pop_valid_bottom() # removes bottom best |
| 142 | + pop_valid_top() # removes top worst |
| 143 | + # move bottom best to top |
| 144 | + in_top[vb] = True |
| 145 | + heapq.heappush(top_heap, (fb, vb, vb)) |
| 146 | + sum_top += fb * vb |
| 147 | + # move top worst to bottom (if still >0 freq) |
| 148 | + in_top[vt] = False |
| 149 | + sum_top -= ft * vt |
| 150 | + if freq.get(vt, 0) > 0: |
| 151 | + heapq.heappush(bottom_heap, (-freq[vt], -vt, vt)) |
| 152 | + # top_size unchanged |
| 153 | + continue |
| 154 | + else: |
| 155 | + break |
| 156 | + |
| 157 | + # add one value to the window |
| 158 | + def add_val(v): |
| 159 | + nonlocal D, sum_top |
| 160 | + prev = freq.get(v, 0) |
| 161 | + freq[v] = prev + 1 |
| 162 | + if prev == 0: |
| 163 | + # new distinct |
| 164 | + D += 1 |
| 165 | + # start in bottom |
| 166 | + in_top[v] = False |
| 167 | + heapq.heappush(bottom_heap, (-freq[v], -v, v)) |
| 168 | + else: |
| 169 | + # existing |
| 170 | + if in_top.get(v, False): |
| 171 | + # its contribution in top increases by v |
| 172 | + sum_top += v |
| 173 | + heapq.heappush(top_heap, (freq[v], v, v)) |
| 174 | + else: |
| 175 | + heapq.heappush(bottom_heap, (-freq[v], -v, v)) |
| 176 | + |
| 177 | + # remove one value from the window |
| 178 | + def remove_val(v): |
| 179 | + nonlocal D, top_size, sum_top |
| 180 | + prev = freq.get(v, 0) |
| 181 | + if prev == 0: |
| 182 | + return |
| 183 | + curr = prev - 1 |
| 184 | + freq[v] = curr |
| 185 | + if in_top.get(v, False): |
| 186 | + # element was in top: reduce contribution by v |
| 187 | + sum_top -= v |
| 188 | + if curr == 0: |
| 189 | + # element removed entirely |
| 190 | + in_top[v] = False |
| 191 | + top_size -= 1 |
| 192 | + D -= 1 |
| 193 | + # do not push any tuple |
| 194 | + else: |
| 195 | + # still present and still considered in top (we'll push updated tuple) |
| 196 | + heapq.heappush(top_heap, (curr, v, v)) |
| 197 | + else: |
| 198 | + # element was in bottom |
| 199 | + if curr == 0: |
| 200 | + D -= 1 |
| 201 | + # nothing to push |
| 202 | + else: |
| 203 | + heapq.heappush(bottom_heap, (-curr, -v, v)) |
| 204 | + |
| 205 | + # initialize first window |
| 206 | + for i in range(k): |
| 207 | + add_val(nums[i]) |
| 208 | + rebalance() |
| 209 | + ans = [sum_top] |
| 210 | + |
| 211 | + # slide |
| 212 | + for i in range(k, n): |
| 213 | + add_val(nums[i]) |
| 214 | + remove_val(nums[i - k]) |
| 215 | + rebalance() |
| 216 | + ans.append(sum_top) |
| 217 | + |
| 218 | + return ans |
| 219 | + |
| 220 | +# Example usage wrapper for LeetCode-style: |
| 221 | +class SolutionWrapper: |
| 222 | + def findXsum(self, nums: List[int], k: int, x: int) -> List[int]: |
| 223 | + return Solution().xSum(nums, k, x) |
| 224 | + |
| 225 | +# If you want, you can test with the examples: |
| 226 | +if __name__ == "__main__": |
| 227 | + s = Solution() |
| 228 | + print(s.xSum([1,1,2,2,3,4,2,3], 6, 2)) # [6,10,12] |
| 229 | + print(s.xSum([3,8,7,8,7,5], 2, 2)) # [11,15,15,15,12] |
| 230 | +``` |
| 231 | + |
| 232 | +- Notes about the solution: |
| 233 | + - We maintain two heaps (top and bottom) and a frequency map with lazy deletion to avoid expensive arbitrary deletions from heaps. |
| 234 | + - top_heap stores tuples (freq, value, value) so the root is the worst among top (lowest freq, and lower value in tie) — we can pop the worst to move it down quickly. |
| 235 | + - bottom_heap stores (-freq, -value, value) so the root is the best among bottom (highest freq, and larger value in tie) — we can pop the best to move it up quickly. |
| 236 | + - in_top marks current membership; lazy entries are ignored when they become stale because their stored freq or in_top status does not match the current state. |
| 237 | + - sum_top is maintained incrementally on adds/removes and during moves between heaps so we can output the x-sum in O(1) per window after rebalancing. |
| 238 | + - Time complexity: O(n log m) where m is the number of distinct elements encountered (<= n). Each add/remove causes a few heap operations; lazy deletions ensure every stale entry is popped once. |
| 239 | + - Space complexity: O(m) for heaps and maps. |
0 commit comments