Skip to content

Conversation

@BrianLusina
Copy link
Owner

@BrianLusina BrianLusina commented Jan 2, 2026

Describe your change:

Data stream disjoint intervals

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

Summary by CodeRabbit

  • New Features

    • Added Data Stream as Disjoint Intervals problem with two implementation approaches for interval tracking and merging.
  • Documentation

    • Added comprehensive README documenting the Data Stream problem, including algorithm details, complexity analysis, and edge cases.
    • Updated directory documentation to include the new Data Stream resource.

✏️ Tip: You can customize this high-level summary in your review settings.

@BrianLusina BrianLusina self-assigned this Jan 2, 2026
@BrianLusina BrianLusina added enhancement Algorithm Algorithm Problem Datastructures Datastructures Array Array data structure Intervals labels Jan 2, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 2, 2026

📝 Walkthrough

Walkthrough

A new Data Stream feature was introduced to manage disjoint intervals, including two implementation classes (SummaryRanges and SummaryRangesV2), comprehensive documentation, and test coverage. The feature supports adding numbers to a stream and retrieving merged interval ranges with automatic merging logic for connected or overlapping intervals.

Changes

Cohort / File(s) Summary
Documentation
DIRECTORY.md, algorithms/intervals/data_stream/README.md
Added new documentation node under Intervals section; introduced README documenting the Data Stream as Disjoint Intervals problem, class interface (Constructor, Add Num, Get Intervals), merging logic, and complexity analysis
Core Implementation
algorithms/intervals/data_stream/__init__.py
Introduced two public classes: SummaryRanges (map-based interval storage with start→end mapping) and SummaryRangesV2 (list-based interval storage); both implement add_num(value) with merging logic and get_intervals() returning disjoint interval ranges
Test Suite
algorithms/intervals/data_stream/test_data_stream_as_disjoint_intervals.py
Added parameterized test module validating both implementations across multiple scenarios including single elements, duplicates, overlapping sequences, and non-contiguous values

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Stream, Documentation

Poem

🐰 A stream flows forth with numbers bright,
Disjoint intervals merge just right,
No gaps or overlaps remain in sight,
Two implementations dance in delight,
Data organized, clean and tight! 📊

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding a data stream disjoint intervals algorithm to the intervals module. It is specific, concise, and accurately reflects the primary contribution of this PR.
Description check ✅ Passed The description includes all required sections from the template: a 'Describe your change' section and a completed checklist. However, the description lacks detail about what the algorithm does and how it works beyond the bare title.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
algorithms/intervals/data_stream/README.md (1)

114-114: Minor: Consider hyphenating compound adjective.

The phrase "Worst case" should be hyphenated when used as a compound adjective: "Worst-case relation to n".

Suggested fix
-- Worst case relation to n: If there are n Add Num(int value) calls and nothing ever merges, then k=O(n), giving 
+- Worst-case relation to n: If there are n Add Num(int value) calls and nothing ever merges, then k=O(n), giving 
algorithms/intervals/data_stream/__init__.py (2)

50-56: Optional: Simplify duplicated assignment.

Both branches of the conditional assign self.intervals[new_start] = new_end. While the current form clearly documents the intent (update vs insert), you could simplify it:

Proposed refactoring
-        # Insert the merged or new interval into the map
-        if new_start in self.intervals:
-            # update existing interval(merged with previous)
-            self.intervals[new_start] = new_end
-        else:
-            # insert new
-            self.intervals[new_start] = new_end
-            insort(self.starts, new_start)
+        # Insert the merged or new interval into the map
+        self.intervals[new_start] = new_end
+        if new_start not in self.starts:
+            insort(self.starts, new_start)

119-126: Consider returning a copy to protect internal state.

The method returns self.intervals directly, which exposes the internal mutable list to external modification. While the current tests don't exploit this, it's a design issue that could lead to bugs if callers modify the returned list.

Note that SummaryRanges.get_intervals() (lines 58-68) correctly returns a new list, which is safer.

Proposed fix
 def get_intervals(self) -> List[List[int]]:
     """
     Returns the current summary of numbers as a list of disjoint intervals [start_i, end_i], sorted by start_i.
     This is run in O(1) time as it simply returns the list of intervals.
     Returns:
         List[List[int]]: List of disjoint intervals
     """
-    return self.intervals
+    return [interval[:] for interval in self.intervals]  # Return a deep copy

Note: This changes the time complexity from O(1) to O(k), but ensures encapsulation. Alternatively, document that callers should not modify the returned list.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1e8e8 and 92bd459.

⛔ Files ignored due to path filters (17)
  • algorithms/intervals/data_stream/images/examples/data_stream_as_disjoint_intervals_example_1.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/examples/data_stream_as_disjoint_intervals_example_2.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/examples/data_stream_as_disjoint_intervals_example_3.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/examples/data_stream_as_disjoint_intervals_example_4.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_1.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_10.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_11.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_12.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_13.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_2.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_3.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_4.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_5.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_6.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_7.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_8.png is excluded by !**/*.png
  • algorithms/intervals/data_stream/images/solutions/data_stream_as_disjoint_intervals_solution_9.png is excluded by !**/*.png
📒 Files selected for processing (4)
  • DIRECTORY.md
  • algorithms/intervals/data_stream/README.md
  • algorithms/intervals/data_stream/__init__.py
  • algorithms/intervals/data_stream/test_data_stream_as_disjoint_intervals.py
🧰 Additional context used
🧬 Code graph analysis (1)
algorithms/intervals/data_stream/test_data_stream_as_disjoint_intervals.py (1)
algorithms/intervals/data_stream/__init__.py (6)
  • SummaryRanges (5-68)
  • SummaryRangesV2 (71-126)
  • add_num (13-56)
  • add_num (78-117)
  • get_intervals (58-68)
  • get_intervals (119-126)
🪛 LanguageTool
algorithms/intervals/data_stream/README.md

[style] ~42-~42: Consider using “the surrounding ranges”.
Context: ...gle-point interval, and then we look at the ranges around it to decide how it fits. - If an existin...

(NOUN_AROUND_IT)


[style] ~46-~46: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ..., we extend that range to include it. - If the value lies just before the start of...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~47-~47: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...a range, we merge it with that range. - If the value sits exactly between two rang...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~114-~114: Use a hyphen to join words.
Context: ...erval once to build the output. - Worst case relation to n: If there are n Add N...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
DIRECTORY.md

136-136: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


137-137: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

🔇 Additional comments (2)
DIRECTORY.md (1)

136-137: LGTM! Documentation update follows existing pattern.

The new Data Stream documentation entries are correctly added and follow the existing indentation pattern in the file. The markdownlint indentation warnings can be safely ignored as they are false positives.

algorithms/intervals/data_stream/test_data_stream_as_disjoint_intervals.py (1)

1-39: LGTM! Comprehensive test coverage.

The test cases effectively validate both implementations across multiple scenarios:

  • Single element and duplicates
  • Consecutive number merging
  • Non-contiguous intervals
  • Out-of-order insertions

The parameterized approach ensures both SummaryRanges and SummaryRangesV2 are tested identically, which is excellent for maintaining consistency.

@BrianLusina BrianLusina merged commit b3db38b into main Jan 2, 2026
5 of 6 checks passed
@BrianLusina BrianLusina deleted the feat/algorithms-intervals-data-stream branch January 2, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algorithm Algorithm Problem Array Array data structure Datastructures Datastructures enhancement Intervals

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants