Skip to content

Commit de3efbc

Browse files
timvermeulenTim Vermeulen
andauthored
Evenly divide a collection into chunks (#96)
Adds `evenlyChunked(in:)` as a `Collection` method that divides a collection into a specific number of chunks as evenly as possible. Co-authored-by: Tim Vermeulen <[email protected]>
1 parent df51f17 commit de3efbc

File tree

3 files changed

+325
-41
lines changed

3 files changed

+325
-41
lines changed

Guides/Chunked.md

Lines changed: 52 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@
33
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Chunked.swift) |
44
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/ChunkedTests.swift)]
55

6-
Break a collection into subsequences where consecutive elements pass a binary
7-
predicate, or where all elements in each chunk project to the same value.
6+
Break a collection into nonoverlapping subsequences:
87

9-
Also includes a `chunks(ofCount:)` that breaks a collection into subsequences
10-
of a given `count`.
8+
* `chunked(by:)` forms chunks of consecutive elements that pass a binary predicate,
9+
* `chunked(on:)` forms chunks of consecutive elements that project to equal values,
10+
* `chunks(ofCount:)` forms chunks of a given size, and
11+
* `evenlyChunked(in:)` forms a given number of equally-sized chunks.
1112

12-
There are two variations of the `chunked` method: `chunked(by:)` and
13-
`chunked(on:)`. `chunked(by:)` uses a binary predicate to test consecutive
14-
elements, separating chunks where the predicate returns `false`. For example,
15-
you can chunk a collection into ascending sequences using this method:
13+
`chunked(by:)` uses a binary predicate to test consecutive elements, separating
14+
chunks where the predicate returns `false`. For example, you can chunk a
15+
collection into ascending sequences using this method:
1616

1717
```swift
1818
let numbers = [10, 20, 30, 10, 40, 40, 10, 20]
@@ -31,11 +31,10 @@ let chunks = names.chunked(on: \.first!)
3131
// [("D", ["David"]), ("K", ["Kyle", "Karoy"]), ("N", ["Nate"])]
3232
```
3333

34-
The `chunks(ofCount:)` method takes a `count` parameter (greater than zero)
35-
and separates the collection into chunks of this given count. If the `count`
36-
parameter is evenly divided by the count of the base `Collection`, all the
37-
chunks will have a count equal to the parameter. Otherwise, the last chunk will
38-
contain the remaining elements.
34+
The `chunks(ofCount:)` method takes a `count` parameter (required to be > 0) and
35+
separates the collection into chunks of this given count. If the length of the
36+
collection is a multiple of the `count` parameter, all chunks will have the
37+
a count equal to the parameter. Otherwise, the last chunk will contain the remaining elements.
3938

4039
```swift
4140
let names = ["David", "Kyle", "Karoy", "Nate"]
@@ -46,7 +45,21 @@ let remaining = names.chunks(ofCount: 3)
4645
// equivalent to [["David", "Kyle", "Karoy"], ["Nate"]]
4746
```
4847

49-
The `chunks(ofCount:)` is the subject of an [existing SE proposal][proposal].
48+
The `chunks(ofCount:)` method was previously [proposed](proposal) for inclusion
49+
in the standard library.
50+
51+
The `evenlyChunked(in:)` method takes a `count` parameter and divides the
52+
collection into `count` number of equally-sized chunks. If the length of the
53+
collection is not a multiple of the `count` parameter, the chunks at the start
54+
will be longer than the chunks at the end.
55+
56+
```swift
57+
let evenChunks = (0..<15).evenlyChunked(in: 3)
58+
// equivalent to [0..<5, 5..<10, 10..<15]
59+
60+
let nearlyEvenChunks = (0..<15).evenlyChunked(in: 4)
61+
// equivalent to [0..<4, 4..<8, 8..<12, 12..<15]
62+
```
5063

5164
When "chunking" a collection, the entire collection is included in the result,
5265
unlike the `split` family of methods, where separators are dropped.
@@ -61,38 +74,40 @@ c.elementsEqual(c.chunked(...).joined())
6174

6275
## Detailed Design
6376

64-
The three methods are added as extension to `Collection`. `chunked(by:)` and
65-
`chunked(on:)` are eager by default, both with a matching version that return a
66-
lazy wrapper added to `LazySequenceProtocol`.
77+
The four methods are added to `Collection`, with matching versions of
78+
`chunked(by:)` and `chunked(on:)` that return a lazy wrapper added to
79+
`LazyCollectionProtocol`.
6780

6881
```swift
6982
extension Collection {
70-
public func chunked(
71-
by belongInSameGroup: (Element, Element) -> Bool
72-
) -> [SubSequence]
83+
public func chunked(
84+
by belongInSameGroup: (Element, Element) -> Bool
85+
) -> [SubSequence]
7386

74-
public func chunked<Subject: Equatable>(
87+
public func chunked<Subject: Equatable>(
7588
on projection: (Element) -> Subject
76-
) -> [(Subject, SubSequence)]
77-
78-
public func chunks(ofCount count: Int) -> ChunksOfCountCollection<Self>
89+
) -> [SubSequence]
90+
91+
public func chunks(ofCount count: Int) -> ChunkedByCount<Self>
92+
93+
public func evenlyChunked(in count: Int) -> EvenChunks<Self>
7994
}
8095

81-
extension LazySequenceProtocol where Self: Collection, Elements: Collection {
82-
public func chunked(
83-
by belongInSameGroup: @escaping (Element, Element) -> Bool
84-
) -> ChunkedByCollection<Elements, Element>
96+
extension LazyCollectionProtocol {
97+
public func chunked(
98+
by belongInSameGroup: @escaping (Element, Element) -> Bool
99+
) -> ChunkedByCollection<Elements>
85100

86-
public func chunked<Subject: Equatable>(
87-
on projection: @escaping (Element) -> Subject
88-
) -> ChunkedOnCollection<Elements, Subject>
101+
public func chunked<Subject: Equatable>(
102+
on projection: @escaping (Element) -> Subject
103+
) -> ChunkedOnCollection<Elements, Subject>
89104
}
90105
```
91106

92-
The `ChunkedByCollection`, `ChunkedOnCollection`, and `ChunksOfCountCollection`
93-
types are bidirectional when the wrapped collection is bidirectional.
94-
`ChunksOfCountCollection` also conforms to `LazySequenceProtocol` when the base
95-
collection conforms.
107+
Each of the "chunked" collection types are bidirectional when the wrapped
108+
collection is bidirectional. `ChunksOfCountCollection` and
109+
`EvenChunksCollection` also conform to `RandomAccessCollection` and
110+
`LazySequenceProtocol` when their base collections conform.
96111

97112
### Complexity
98113

@@ -120,5 +135,5 @@ into potentially overlapping subsequences.
120135
**Ruby:** Ruby’s `Enumerable` class defines `chunk_while` and `chunk`, which map
121136
to the proposed `chunked(by:)` and `chunked(on:)` methods.
122137

123-
**Rust:** Rust defines a variety of size-based `chunks` methods, but doesn’t
124-
include any with the functionality described here.
138+
**Rust:** Rust defines a variety of size-based `chunks` methods, of which the
139+
standard version corresponds to the `chunks(ofCount:)` method defined here.

Sources/Algorithms/Chunked.swift

Lines changed: 240 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,210 @@ extension ChunkedOnCollection: BidirectionalCollection
198198

199199
extension ChunkedOnCollection: LazyCollectionProtocol {}
200200

201+
/// A collection wrapper that evenly breaks a collection into a given number of
202+
/// chunks.
203+
public struct EvenChunksCollection<Base: Collection> {
204+
/// The base collection.
205+
@usableFromInline
206+
internal let base: Base
207+
208+
/// The number of equal chunks the base collection is divided into.
209+
@usableFromInline
210+
internal let numberOfChunks: Int
211+
212+
/// The count of the base collection.
213+
@usableFromInline
214+
internal let baseCount: Int
215+
216+
/// The upper bound of the first chunk.
217+
@usableFromInline
218+
internal var firstUpperBound: Base.Index
219+
220+
@inlinable
221+
internal init(base: Base, numberOfChunks: Int) {
222+
self.base = base
223+
self.numberOfChunks = numberOfChunks
224+
self.baseCount = base.count
225+
self.firstUpperBound = base.startIndex
226+
227+
if numberOfChunks > 0 {
228+
firstUpperBound = endOfChunk(startingAt: base.startIndex, offset: 0)
229+
}
230+
}
231+
}
232+
233+
extension EvenChunksCollection {
234+
/// Returns the number of chunks with size `smallChunkSize + 1` at the start
235+
/// of this collection.
236+
@inlinable
237+
internal var numberOfLargeChunks: Int {
238+
baseCount % numberOfChunks
239+
}
240+
241+
/// Returns the size of a chunk at a given offset.
242+
@inlinable
243+
internal func sizeOfChunk(offset: Int) -> Int {
244+
let isLargeChunk = offset < numberOfLargeChunks
245+
return baseCount / numberOfChunks + (isLargeChunk ? 1 : 0)
246+
}
247+
248+
/// Returns the index in the base collection of the end of the chunk starting
249+
/// at the given index.
250+
@inlinable
251+
internal func endOfChunk(startingAt start: Base.Index, offset: Int) -> Base.Index {
252+
base.index(start, offsetBy: sizeOfChunk(offset: offset))
253+
}
254+
255+
/// Returns the index in the base collection of the start of the chunk ending
256+
/// at the given index.
257+
@inlinable
258+
internal func startOfChunk(endingAt end: Base.Index, offset: Int) -> Base.Index {
259+
base.index(end, offsetBy: -sizeOfChunk(offset: offset))
260+
}
261+
262+
/// Returns the index that corresponds to the chunk that starts at the given
263+
/// base index.
264+
@inlinable
265+
internal func indexOfChunk(startingAt start: Base.Index, offset: Int) -> Index {
266+
guard offset != numberOfChunks else { return endIndex }
267+
let end = endOfChunk(startingAt: start, offset: offset)
268+
return Index(start..<end, offset: offset)
269+
}
270+
271+
/// Returns the index that corresponds to the chunk that ends at the given
272+
/// base index.
273+
@inlinable
274+
internal func indexOfChunk(endingAt end: Base.Index, offset: Int) -> Index {
275+
let start = startOfChunk(endingAt: end, offset: offset)
276+
return Index(start..<end, offset: offset)
277+
}
278+
}
279+
280+
extension EvenChunksCollection: Collection {
281+
public struct Index: Comparable {
282+
/// The range corresponding to the chunk at this position.
283+
@usableFromInline
284+
internal var baseRange: Range<Base.Index>
285+
286+
/// The offset corresponding to the chunk at this position. The first chunk
287+
/// has offset `0` and all other chunks have an offset `1` greater than the
288+
/// previous.
289+
@usableFromInline
290+
internal var offset: Int
291+
292+
@inlinable
293+
internal init(_ baseRange: Range<Base.Index>, offset: Int) {
294+
self.baseRange = baseRange
295+
self.offset = offset
296+
}
297+
298+
@inlinable
299+
public static func == (lhs: Self, rhs: Self) -> Bool {
300+
lhs.offset == rhs.offset
301+
}
302+
303+
@inlinable
304+
public static func < (lhs: Self, rhs: Self) -> Bool {
305+
lhs.offset < rhs.offset
306+
}
307+
}
308+
309+
public typealias Element = Base.SubSequence
310+
311+
@inlinable
312+
public var startIndex: Index {
313+
Index(base.startIndex..<firstUpperBound, offset: 0)
314+
}
315+
316+
@inlinable
317+
public var endIndex: Index {
318+
Index(base.endIndex..<base.endIndex, offset: numberOfChunks)
319+
}
320+
321+
@inlinable
322+
public func index(after i: Index) -> Index {
323+
precondition(i != endIndex, "Can't advance past endIndex")
324+
let start = i.baseRange.upperBound
325+
return indexOfChunk(startingAt: start, offset: i.offset + 1)
326+
}
327+
328+
@inlinable
329+
public subscript(position: Index) -> Element {
330+
precondition(position != endIndex)
331+
return base[position.baseRange]
332+
}
333+
334+
@inlinable
335+
public func index(_ i: Index, offsetBy distance: Int) -> Index {
336+
/// Returns the base distance between two `EvenChunksCollection` indices
337+
/// from the end of one to the start of the other, when given their offsets.
338+
func baseDistance(from offsetA: Int, to offsetB: Int) -> Int {
339+
let smallChunkSize = baseCount / numberOfChunks
340+
let numberOfChunks = (offsetB - offsetA) - 1
341+
342+
let largeChunksEnd = Swift.min(self.numberOfLargeChunks, offsetB)
343+
let largeChunksStart = Swift.min(self.numberOfLargeChunks, offsetA + 1)
344+
let numberOfLargeChunks = largeChunksEnd - largeChunksStart
345+
346+
return smallChunkSize * numberOfChunks + numberOfLargeChunks
347+
}
348+
349+
if distance == 0 {
350+
return i
351+
} else if distance > 0 {
352+
let offset = i.offset + distance
353+
let baseOffset = baseDistance(from: i.offset, to: offset)
354+
let start = base.index(i.baseRange.upperBound, offsetBy: baseOffset)
355+
return indexOfChunk(startingAt: start, offset: offset)
356+
} else {
357+
let offset = i.offset + distance
358+
let baseOffset = baseDistance(from: offset, to: i.offset)
359+
let end = base.index(i.baseRange.lowerBound, offsetBy: -baseOffset)
360+
return indexOfChunk(endingAt: end, offset: offset)
361+
}
362+
}
363+
364+
@inlinable
365+
public func index(_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index? {
366+
if distance >= 0 {
367+
if (0..<distance).contains(self.distance(from: i, to: limit)) {
368+
return nil
369+
}
370+
} else {
371+
if (0..<(-distance)).contains(self.distance(from: limit, to: i)) {
372+
return nil
373+
}
374+
}
375+
return index(i, offsetBy: distance)
376+
}
377+
378+
@inlinable
379+
public func distance(from start: Index, to end: Index) -> Int {
380+
end.offset - start.offset
381+
}
382+
}
383+
384+
extension EvenChunksCollection.Index: Hashable where Base.Index: Hashable {}
385+
386+
extension EvenChunksCollection: BidirectionalCollection
387+
where Base: BidirectionalCollection
388+
{
389+
@inlinable
390+
public func index(before i: Index) -> Index {
391+
precondition(i != startIndex, "Can't advance before startIndex")
392+
return indexOfChunk(endingAt: i.baseRange.lowerBound, offset: i.offset - 1)
393+
}
394+
}
395+
396+
extension EvenChunksCollection: RandomAccessCollection
397+
where Base: RandomAccessCollection {}
398+
399+
extension EvenChunksCollection: LazySequenceProtocol
400+
where Base: LazySequenceProtocol {}
401+
402+
extension EvenChunksCollection: LazyCollectionProtocol
403+
where Base: LazyCollectionProtocol {}
404+
201405
//===----------------------------------------------------------------------===//
202406
// lazy.chunked(by:) / lazy.chunked(on:)
203407
//===----------------------------------------------------------------------===//
@@ -583,5 +787,40 @@ extension Collection {
583787

584788
extension ChunksOfCountCollection.Index: Hashable where Base.Index: Hashable {}
585789

586-
extension ChunksOfCountCollection: LazySequenceProtocol, LazyCollectionProtocol
790+
extension ChunksOfCountCollection: LazySequenceProtocol
587791
where Base: LazySequenceProtocol {}
792+
793+
extension ChunksOfCountCollection: LazyCollectionProtocol
794+
where Base: LazyCollectionProtocol {}
795+
796+
//===----------------------------------------------------------------------===//
797+
// evenlyChunked(in:)
798+
//===----------------------------------------------------------------------===//
799+
800+
extension Collection {
801+
/// Returns a collection of `count` evenly divided subsequences of this
802+
/// collection.
803+
///
804+
/// This method divides the collection into a given number of equally sized
805+
/// chunks. If the length of the collection is not divisible by `count`, the
806+
/// chunks at the start will be longer than the chunks at the end, like in
807+
/// this example:
808+
///
809+
/// for chunk in "Hello, world!".evenlyChunked(in: 5) {
810+
/// print(chunk)
811+
/// }
812+
/// // "Hel"
813+
/// // "lo,"
814+
/// // " wo"
815+
/// // "rl"
816+
/// // "d!"
817+
///
818+
/// - Complexity: O(1) if the collection conforms to `RandomAccessCollection`,
819+
/// otherwise O(*n*), where *n* is the length of the collection.
820+
@inlinable
821+
public func evenlyChunked(in count: Int) -> EvenChunksCollection<Self> {
822+
precondition(count >= 0, "Can't divide into a negative number of chunks")
823+
precondition(count > 0 || isEmpty, "Can't divide a non-empty collection into 0 chunks")
824+
return EvenChunksCollection(base: self, numberOfChunks: count)
825+
}
826+
}

0 commit comments

Comments
 (0)