-
Notifications
You must be signed in to change notification settings - Fork 6
add groupByOrdered #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add groupByOrdered #346
Changes from 3 commits
6a096f6
f33e726
fd8e6a5
7c66b1e
7b2bb36
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -57,7 +57,24 @@ class GenericSteps[A](iterator: Iterator[A]) extends AnyVal { | |||||
| counts.to(Map) | ||||||
| } | ||||||
|
|
||||||
| def groupBy[K](f: A => K): Map[K, List[A]] = l.groupBy(f) | ||||||
| /** Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged. */ | ||||||
| @Doc(info = | ||||||
| "Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged." | ||||||
|
||||||
| "Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged." | |
| "Execute the traversal and group elements by a given transformation function, ignoring the iterator order. If you need reproducable results, please use `groupByOrdered` instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation of the issue. But use is really discouraged: This is a giant footgun that has blown us up multiple times. (I still shudder at the bug with iteration order in the legacy occurenceHash...)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to have our own variant, we might as well change it to not use the slowest of all the data structures (linked lists). I.e. return a Vector or an ArraySeq instead.
That makes it slightly less of a drop-in replacement (pattern matching on the result requires different operators), but IMO that's an acceptable cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 for Vector or ArraySeq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector or ArraySeq requires us to copy the data once more. I prepared a version with LinkedHashMap[K, ArrayBuffer[A]]; the java.util.LinkedHashMap implementation is basically the same as scala.collection.mutable.LinkedHashMap, and writing our own to support even faster groupBy is way overkill.
Immutable data structures are way overrated compared to "just don't mutate the datastructure".
Vector is an amazing feat of engineering. The cool thing is not "immutable", the cool thing is "O(1) snapshots", plus niche applications like good write performance for ZFS on tape drives and hard-drives (writes are sequential!) and SSDs (no write amplification because non-overwriting).
That being said, we only very rarely make active use of O(1) snapshots, and we are running on SRAM/DRAM that supports overwriting, as opposed to flash (which cannot be overwritten).
Uh oh!
There was an error while loading. Please reload this page.