Skip to content

Commit b3db38b

Browse files
authored
Merge pull request #137 from BrianLusina/feat/algorithms-intervals-data-stream
feat(algorithms, intervals, data-stream): data stream disjoint intervals
2 parents 5a1e8e8 + 92bd459 commit b3db38b

21 files changed

+287
-0
lines changed

DIRECTORY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,8 @@
133133
* [Test Car Pooling](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/intervals/car_pooling/test_car_pooling.py)
134134
* Count Days
135135
* [Test Count Days Without Meetings](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/intervals/count_days/test_count_days_without_meetings.py)
136+
* Data Stream
137+
* [Test Data Stream As Disjoint Intervals](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/intervals/data_stream/test_data_stream_as_disjoint_intervals.py)
136138
* Employee Free Time
137139
* [Interval](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/intervals/employee_free_time/interval.py)
138140
* [Test Employee Free Time](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/intervals/employee_free_time/test_employee_free_time.py)
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Data Stream as Disjoint Intervals
2+
3+
You are given a stream of non-negative integers a1, a2,... ,an . At any point, you need to summarize all numbers seen so
4+
far as a list of disjoint intervals.
5+
6+
Your task is to implement the Summary Ranges class, where:
7+
8+
1. Constructor: Initializes the Summary Ranges object with an empty stream.
9+
2. Add Num(int value): Adds the integer value to the stream.
10+
3. Get Intervals(): Returns the current summary of numbers as a list of disjoint intervals [start_i, end_i], sorted by start_i.
11+
12+
> Note: Each number belongs to exactly one interval. Intervals must merge whenever new numbers connect or extend existing
13+
> ones, and duplicate insertions should not affect the summary.
14+
15+
## Constraints
16+
17+
- 0 <= value <= 10^4
18+
- At most 3*10^4 calls will be made to addNum and getIntervals.
19+
- At most 10^2 calls will be made to getIntervals.
20+
21+
## Examples
22+
23+
![Example 1](./images/examples/data_stream_as_disjoint_intervals_example_1.png)
24+
![Example 2](./images/examples/data_stream_as_disjoint_intervals_example_2.png)
25+
![Example 3](./images/examples/data_stream_as_disjoint_intervals_example_3.png)
26+
![Example 4](./images/examples/data_stream_as_disjoint_intervals_example_4.png)
27+
28+
## Solution
29+
30+
When numbers arrive one after another in a stream, it’s easy to imagine them as scattered pebbles landing on a number
31+
line. If we keep them as-is, the picture quickly becomes messy. Instead, we want to summarize what we’ve seen into clean
32+
stretches of consecutive values, i.e., intervals. The challenge is that every new number can behave differently:
33+
- It may fall inside an existing interval.
34+
- It may extend an interval by one.
35+
- It may even act like a missing puzzle piece that connects two intervals into one larger block.
36+
37+
This constant merging and organizing is why the “intervals” pattern is the right fit. Rather than storing every number,
38+
we maintain only the boundaries of disjoint intervals and carefully update them when new values arrive. This way, our
39+
summary stays compact, sorted, and easy to return.
40+
41+
We start by keeping a sorted collection of intervals instead of recording every number one by one. Each new value starts
42+
as a small single-point interval, and then we look at the ranges around it to decide how it fits.
43+
44+
- If an existing range already covers the value, we simply ignore it.
45+
- If the value lies just beyond the end of a range, we extend that range to include it.
46+
- If the value lies just before the start of a range, we merge it with that range.
47+
- If the value sits exactly between two ranges, we merge them into one larger range.
48+
49+
![Solution 1](./images/solutions/data_stream_as_disjoint_intervals_solution_1.png)
50+
![Solution 2](./images/solutions/data_stream_as_disjoint_intervals_solution_2.png)
51+
![Solution 3](./images/solutions/data_stream_as_disjoint_intervals_solution_3.png)
52+
![Solution 4](./images/solutions/data_stream_as_disjoint_intervals_solution_4.png)
53+
54+
If none of the above cases apply, the number remains in a new interval. The stored intervals are sorted and disjointed
55+
at any time, so generating the summary is as simple as listing them in order.
56+
57+
The following steps can be performed to implement the algorithm above:
58+
59+
1. We keep the data as a sorted map called intervals, where in this map, the key is the start of an interval and the
60+
value is the end of that interval. This ensures intervals are always ordered by their start and remain disjoint.
61+
2. Constructor: We initialize intervals as an empty map in the constructor.
62+
3. Add Num(int value): Adding a number to the stream.
63+
- We treat the new number value as a small interval by setting newStart = value and newEnd = value. This will be our
64+
candidate interval that may expand or merge.
65+
- Then, we work on finding the two neighbors around this number:
66+
- nextInterval, which is the first interval whose start is greater than value.
67+
- prevInterval, which is the interval immediately before nextInterval, if one exists.
68+
- Check the previous interval:
69+
- If prevInterval->end (the end of the previous interval) is greater than or equal to the value, then the number
70+
is already covered inside that range. In this case, we simply return without any changes.
71+
- If prevInterval->second equals value - 1, then the new number touches the end of the previous interval.
72+
- In this case, we extend the candidate’s start (newStart) to prevInterval->first so that the candidate also
73+
includes the previous interval.
74+
- Check the next interval:
75+
- If nextInterval->start (the start of the next interval) equals value + 1, then the new number touches the start
76+
of the next interval.
77+
- We extend the candidate’s end (newEnd) to nextInterval->second and remove nextInterval from the map, since it
78+
will be merged.
79+
- If the previous and next conditions apply, the candidate bridges them into one larger interval.
80+
- Finally, we insert the merged interval into the map as intervals[newStart] = newEnd.
81+
- This overwrites the previous interval if it was extended.
82+
- It replaces the next interval if it was merged.
83+
- Or it creates a new single-point interval if no merges happened.
84+
4. Get Intervals(): Getting all intervals.
85+
- We create an empty result list.
86+
- Then we iterate through all the entries in intervals, where each interval is interval.first as the start and
87+
interval.second as the end.
88+
- For each entry, we push [interval.first, interval.second] into the result list.
89+
- Finally, we return the result list, and as intervals is always maintained, sorted, and disjointed, the result
90+
requires no further processing.
91+
92+
Let’s look at the following illustration to get a better understanding of the solution:
93+
94+
![Solution 5](./images/solutions/data_stream_as_disjoint_intervals_solution_5.png)
95+
![Solution 6](./images/solutions/data_stream_as_disjoint_intervals_solution_6.png)
96+
![Solution 7](./images/solutions/data_stream_as_disjoint_intervals_solution_7.png)
97+
![Solution 8](./images/solutions/data_stream_as_disjoint_intervals_solution_8.png)
98+
![Solution 9](./images/solutions/data_stream_as_disjoint_intervals_solution_9.png)
99+
![Solution 10](./images/solutions/data_stream_as_disjoint_intervals_solution_10.png)
100+
![Solution 11](./images/solutions/data_stream_as_disjoint_intervals_solution_11.png)
101+
![Solution 12](./images/solutions/data_stream_as_disjoint_intervals_solution_12.png)
102+
103+
### Time Complexity
104+
105+
Let k be the current number of disjoint intervals stored in the intervals map.
106+
107+
- Add Num(int value): O(logk):
108+
- One upper bound O(logk), at most one predecessor check O(1), and up to one erase (of the next interval) plus one
109+
insert/update (each O(logk).
110+
111+
- Get Intervals(): O(k)
112+
- We iterate over every stored interval once to build the output.
113+
114+
- Worst case relation to n: If there are n Add Num(int value) calls and nothing ever merges, then k=O(n), giving
115+
Add Num(int value) O(logn) and Get Intervals() O(n).
116+
117+
### Space Complexity
118+
119+
As we store only interval boundaries (start → end) rather than every number seen, the space complexity is O(k). In the
120+
worst case with no merges, k=O(n), so space becomes O(n).
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
from typing import List, Dict
2+
from bisect import bisect_right, insort
3+
4+
5+
class SummaryRanges:
6+
def __init__(self):
7+
# Map to store intervals: key = start of interval, value = end of interval.
8+
# Intervals are kept sorted automatically by map ordering.
9+
self.intervals: Dict[int, int] = {}
10+
# stores the starts of the intervals
11+
self.starts: List[int] = []
12+
13+
def add_num(self, value: int) -> None:
14+
"""
15+
Adds the integer value to the stream.
16+
17+
Args:
18+
value (int): The integer value to add to the stream.
19+
"""
20+
new_start = value
21+
new_end = value
22+
23+
# Find the first interval with a start greater than 'value'
24+
idx = bisect_right(self.starts, value)
25+
26+
# Check the interval immediately before 'value' (if it exists)
27+
if idx > 0:
28+
previous_start = self.starts[idx - 1]
29+
previous_end = self.intervals[previous_start]
30+
31+
# Case 1: 'value' already inside an existing interval -> do nothing
32+
if previous_end >= value:
33+
return
34+
35+
# Case 2: 'value' extends the previous interval by exactly 1
36+
if previous_end == value - 1:
37+
# merge with previous
38+
new_start = previous_start
39+
40+
# Case 3: 'value' connects directly to the start of the next interval
41+
next_start = self.starts[idx] if idx < len(self.starts) else None
42+
if next_start is not None and next_start == value + 1:
43+
# merge with next
44+
new_end = self.intervals[next_start]
45+
# remove the old interval [next_start, ...]
46+
del self.intervals[next_start]
47+
self.starts.pop(idx)
48+
49+
# Insert the merged or new interval into the map
50+
if new_start in self.intervals:
51+
# update existing interval(merged with previous)
52+
self.intervals[new_start] = new_end
53+
else:
54+
# insert new
55+
self.intervals[new_start] = new_end
56+
insort(self.starts, new_start)
57+
58+
def get_intervals(self) -> List[List[int]]:
59+
"""
60+
Returns the current summary of numbers as a list of disjoint intervals [start_i, end_i], sorted by start_i.
61+
Returns:
62+
List[List[int]]: List of disjoint intervals
63+
"""
64+
result: List[List[int]] = []
65+
# Collect all intervals from the map
66+
for s in self.starts:
67+
result.append([s, self.intervals[s]])
68+
return result
69+
70+
71+
class SummaryRangesV2:
72+
def __init__(self):
73+
# stores the intervals
74+
self.intervals: List[List[int]] = []
75+
# stores the starts of the intervals
76+
self.starts: List[int] = []
77+
78+
def add_num(self, value: int) -> None:
79+
# If we have an existing interval, we should join the new value if: start - 1 <= value <= end + 1
80+
# So, for an interval [start, end], the condition to see if value is already included is: start <= value <= end
81+
82+
# 1. Get the index of the interval where the value should be inserted
83+
idx = bisect_right(self.starts, value)
84+
85+
# ... (Case 1: Duplicate)
86+
# if idx is greater than 0, our left neighbor is at idx - 1
87+
# 2. Check for Duplicate (already in left neighbor)
88+
if (
89+
idx > 0
90+
and self.intervals[idx - 1][0] <= value <= self.intervals[idx - 1][1]
91+
):
92+
return
93+
94+
# identify possible neighbors for merging
95+
left_merge = idx > 0 and self.intervals[idx - 1][1] + 1 == value
96+
right_merge = idx < len(self.intervals) and self.intervals[idx][0] - 1 == value
97+
98+
# ... (Case 2: Double Merge)
99+
# 3. Check for Double Merge
100+
if left_merge and right_merge:
101+
self.intervals[idx - 1][1] = self.intervals[idx][1]
102+
self.starts.pop(idx)
103+
self.intervals.pop(idx)
104+
# ... (Case 3: Left Merge)
105+
elif left_merge:
106+
# Extend the end of the left interval
107+
self.intervals[idx - 1][1] = value
108+
# ... (Case 4: Right Merge)
109+
elif right_merge:
110+
# Extend the start of the right interval
111+
self.intervals[idx][0] = value
112+
# keep the starts list updated
113+
self.starts[idx] = value
114+
# ... (Case 5: New Island)
115+
else:
116+
insort(self.starts, value)
117+
insort(self.intervals, [value, value])
118+
119+
def get_intervals(self) -> List[List[int]]:
120+
"""
121+
Returns the current summary of numbers as a list of disjoint intervals [start_i, end_i], sorted by start_i.
122+
This is run in O(1) time as it simply returns the list of intervals.
123+
Returns:
124+
List[List[int]]: List of disjoint intervals
125+
"""
126+
return self.intervals
73.5 KB
Loading
77.4 KB
Loading
84.8 KB
Loading
86.9 KB
Loading
45.6 KB
Loading
80.8 KB
Loading
79.8 KB
Loading

0 commit comments

Comments
 (0)