Adds interval sweep technique

jsgoller1 · jsgoller1 · commit 55189b3fe31f · 2021-01-25T22:23:47.000-08:00
diff --git a/implementations/Technique - Interval Sweep .ipynb b/implementations/Technique - Interval Sweep .ipynb
@@ -0,0 +1,223 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Technique - Interval Sweep\n",
+    "\n",
+    "This is a cool trick I learned from [@lee215](https://leetcode.com/problems/maximum-sum-obtained-of-any-permutation/discuss/854206/JavaC++Python-Sweep-Line) on LeetCode for efficiently handling some types of overlapping interval problems. \n",
+    "\n",
+    "### Problem\n",
+    "Suppose we're given an array `ranges` where each element represents the start and end value of a range of sequential integers; `[1, 5]` stands for `[1,2,3,4,5]`. For each integer in each range, return a count of the number of ranges it is in.\n",
+    "\n",
+    "**Example 1:**\n",
+    "```\n",
+    "Input: ranges = [[1,3], [2,4]]\n",
+    "Output: {1:1, 2:2, 3:2, 4:1}\n",
+    "Explanation: [1,3] means [1,2,3], and [2,4] means [2,3,4]. 1 is in one range, 2 is in two ranges, \n",
+    "3 is in two ranges, and 4 is in one range. \n",
+    "```\n",
+    "**Example 2:**\n",
+    "```\n",
+    "Input: [[1,3], [1,4], [2,4], [2,5]]\n",
+    "Output: {1:2, 2:4, 3:4, 4:3, 5:1}\n",
+    "Explanation: Our ranges are [1,2,3], [1,2,3,4], [2,3,4], [2,3,4,5]. 1 is in one range, 2 is in four ranges, etc. \n",
+    "```\n",
+    "\n",
+    "Some constraints:\n",
+    "- `0 <= len(ranges) <= 10**5`\n",
+    "- For every `ranges[i]`:\n",
+    "    - `0 <= ranges[i][0] <= ranges[i][1] <= 10**5`\n",
+    "    - `len(ranges[i]) == 2`\n",
+    "\n",
+    "### Brute force approach\n",
+    "Using a counter, we could iterate through every value in every range and increment its frequency. This approach is `O(n^2)`; our worst case is `0 <= n <= 10**5` ranges each covering `0 <= n <= 10**5` values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1: 2\n",
+      "2: 4\n",
+      "3: 4\n",
+      "4: 3\n",
+      "5: 1\n"
+     ]
+    }
+   ],
+   "source": [
+    "from collections import Counter\n",
+    "\n",
+    "def count_frequencies(ranges):\n",
+    "    frequencies = Counter()\n",
+    "    for start, end in ranges:\n",
+    "        for i in range(start, end+1):\n",
+    "            frequencies[i]+=1\n",
+    "    return frequencies\n",
+    "        \n",
+    "for val, count in count_frequencies([[1,3], [1,4], [2,4], [2,5]]).items():\n",
+    "    print(f\"{val}: {count}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Interval sweep technique\n",
+    "\n",
+    "Consider that each time we see a range, all values between the start and the end have are now present in one range (plus any others). We can only get around the quadratic complexity by skipping these middle values. Let's keep a Counter `bounds_counts`; for each range given, `bounds_count[range[0]] += 1` and `bounds_count[range[0]+1] -= 1` (this may sound weird, but hang on). \n",
+    "\n",
+    "If our ranges are `[[1,3],[2,4]]`, then the correct result would be `{1:1, 2:2, 3:2, 4:1}`. Using this strategy, `bounds_counts` looks like the following:\n",
+    "```\n",
+    "[1,3] -> {1:1, 4:-1}\n",
+    "[2,4] -> {1:1, 2:1, 4: -1, 5:-1}\n",
+    "```\n",
+    "Then, if we initialize `frequency = 0` and `frequencies = {}`, and loop over `range(1,5)`, adding `bounds_counts[i]` to `frequency` each time a value is in `bounds_counts` and storing the result in `frequencies`, the following occurs:\n",
+    "```\n",
+    "val   frequency   frequencies\n",
+    "-        0            {}\n",
+    "1        1 (+1)       {1:1}\n",
+    "2        2 (+1)       {1:1, 2:2}\n",
+    "3        2 (+0)       {1:1, 2:2, 3:2}\n",
+    "4        1 (-1)       {1:1, 2:2, 3:2, 4:1}\n",
+    "5        0 (-1)       same as above; don't include values with frequency 0. \n",
+    "```\n",
+    "so frequencies now corretly reflects the count of intervals covering each value, and calculates it in linear time. In code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1: 2\n",
+      "2: 4\n",
+      "3: 4\n",
+      "4: 3\n",
+      "5: 1\n"
+     ]
+    }
+   ],
+   "source": [
+    "def count_frequencies(ranges):\n",
+    "    bounds_counts = Counter()\n",
+    "    for start, end in ranges:\n",
+    "        bounds_counts[start] += 1\n",
+    "        bounds_counts[end+1] -= 1\n",
+    "    bounds = bounds_counts.keys()\n",
+    "    first, last = min(bounds), max(bounds)\n",
+    "    \n",
+    "    count = 0\n",
+    "    frequencies = {}\n",
+    "    for val in range(first, last):\n",
+    "        count += bounds_counts[val] if val in bounds_counts else 0\n",
+    "        frequencies[val] = count\n",
+    "    return frequencies\n",
+    "        \n",
+    "for val, count in count_frequencies([[1,3], [1,4], [2,4], [2,5]]).items():\n",
+    "    print(f\"{val}: {count}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example: [Maximum Sum Obtained of Any Permutation](https://leetcode.com/problems/maximum-sum-obtained-of-any-permutation/)\n",
+    "\n",
+    "There are two tricks for this problem:\n",
+    "1. Whatever \"permutation\" we pick should have the largest numbers at the most frequently covered index. \n",
+    "2. We should get the index frequencies using the interval sweep technique. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from collections import Counter\n",
+    "from typing import List\n",
+    "\n",
+    "def interval_sweep(requests):\n",
+    "    \"\"\"\n",
+    "    Given a list of ranges, return an array\n",
+    "    where arr[i] equals the number of ranges\n",
+    "    including i.\n",
+    "    \"\"\"\n",
+    "    bounds_counts = Counter()\n",
+    "    first, last = float('inf'), -float('inf')\n",
+    "    for start,end in requests:\n",
+    "        bounds_counts[start] += 1\n",
+    "        bounds_counts[end+1] -= 1\n",
+    "        first = min(start,end,first)\n",
+    "        last = max(start,end,last)\n",
+    "\n",
+    "    frequency = 0\n",
+    "    overlap_count = []\n",
+    "    for val in range(first, last+1):\n",
+    "        frequency += bounds_counts[val]\n",
+    "        overlap_count.append(frequency)\n",
+    "    return overlap_count\n",
+    "\n",
+    "\n",
+    "class Solution:\n",
+    "    def maxSumRangeQuery(self, nums: List[int], requests: List[List[int]]) -> int:\n",
+    "        nums.sort(reverse=True)\n",
+    "        overlap_count = interval_sweep(requests)        \n",
+    "        max_sum = 0\n",
+    "        for i, frequency in enumerate(sorted(overlap_count, reverse=True)):\n",
+    "            max_sum += frequency*nums[i]\n",
+    "        return max_sum % (10**9 + 7)\n",
+    "            \n",
+    "s = Solution()\n",
+    "cases = [\n",
+    "    ([1,2,3,4,5], [[1,3],[0,1]], 19),\n",
+    "    ([1,2,3,4,5,6], [[0,1]], 11),\n",
+    "    ([1,2,3,4,5,10], [[0,2],[1,3],[1,1]], 47)\n",
+    "]\n",
+    "for nums, requests, expected in cases:\n",
+    "    actual = s.maxSumRangeQuery(nums, requests)\n",
+    "    assert actual == expected, f\"{nums, requests}: {expected} != {actual}\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}