Skip to content

Conversation

rafaqz
Copy link

@rafaqz rafaqz commented Oct 24, 2022

This PR reduces array allocations for some nice performance improvements. It makes missing_omit _maybe_missing_omit, which does nothing when none of the columns have Missing eltypes.

I suspect modelcols can be much further optimized by reallocating and using a loop rather than reduce(vcat, x). I may add that too - this the main performance bottleneck after this PR.

@rafaqz rafaqz force-pushed the reduce_allocations branch from e01b872 to 5c2fecf Compare October 24, 2022 12:48
@rafaqz rafaqz marked this pull request as draft October 24, 2022 12:49
@@ -53,22 +53,26 @@ end
_missing_omit(x::AbstractVector{T}) where T = copyto!(similar(x, nonmissingtype(T)), x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, the fact that this makes a copy even if T doesn't allow Missing was intentional to ensure that every column gets copied and downstream users of this function can mutate the result without worrying about affecting the original data. If we don't care about that, we can do

Suggested change
_missing_omit(x::AbstractVector{T}) where T = copyto!(similar(x, nonmissingtype(T)), x)
_missing_omit(x::AbstractVector{T}) where {T>:Missing} = copyto!(similar(x, nonmissingtype(T)), x)
_missing_omit(x::AbstractVector{T}) where {T} = x

without needing any other changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants