Replies: 1 comment
-
I got curious and poked around at how I would try it. Basically, none of the above strategies worked and I ended up using match type reduction. Which I'm not sure is ideal. I fear I have a habit of underestimating the difficulty of working with this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As a follow up to this PR, a discussion.
#42
Userland Solution
First and most importantly, note that this can be fairly easily solved in userland through trivial applications of the standard library / named tuples by simple renaming the columns.

If this is unclear how / why that works, it could be worth going a level deeper into named tuples and the standard library.
So: do we need something built in?
I would still argue yes. Detecting such duplicates is easy with 3 columns. At a useful scale, that's not such a simple task.
Options
Compiletime
User facing API looks like
def dups: CsvIterator[("col1", "col2", "col1_1")] = CSV.resource("dups.csv", dedplicatedHeaders = true)
Pros:
Probably the "neatest" solution
As the default could be deduplication, potentially hide complexity from the user
Implementation is probably easier as dealing with runtime constructs inside macro constraints.
Cons:
Pushes significant complexity into the highest risk part of the design
silently tampers with users data with a complex deduplication algorithm.
if the algorithm were to need configuration that's problematic given there is no compound compile time literal that can be passed to the macro
Runtime
User facing API looks like
Pros:
Probably closer aligned to user expectation
doesn't tamper with user data without explicit opt in
configurable
Cons:
Requires user to read documentation or otherwise discover the method
Not entirely sure about implementation complexity as the names have to be a compiletime constant. In extremis might have to be implemented via a match type sort of construct...
Notes:
Probably implement as extension method on CSVIterator with logic that could be easily extracted for use with other sources - e.g.
ExcelIterator
.Other thoughts?
Beta Was this translation helpful? Give feedback.
All reactions