-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(cubesql): Add XIRR aggregate function
#9508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alex Qyoun-ae <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9508 +/- ##
==========================================
- Coverage 83.89% 80.52% -3.38%
==========================================
Files 229 383 +154
Lines 83569 96892 +13323
Branches 0 2223 +2223
==========================================
+ Hits 70111 78018 +7907
- Misses 13458 18559 +5101
- Partials 0 315 +315
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
KSDaemon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 LGTM!
mcheshkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some notes, nothing critical
| }; | ||
| let signature = Signature::one_of( | ||
| type_signatures, | ||
| Volatility::Volatile, // due to the usage of [`f64::powf`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be Immutable - given same inputs it should generate same output, so it is OK to lift that calculation. It's safe to keep it Volatile, it should just deny some optimizations in DF.
Non-determinism in floating point precision does not turn functions into voltaile:
BuiltinScalarFunction::Log10 or BuiltinScalarFunction::Cos are Volatility::Immutable, but implemented with regular f64::log10 and f64::cos, which are non-deterministic due to different host and target platforms, compiler version and what not. It's not like now or random, where results are expected to be different.
https://doc.rust-lang.org/std/primitive.f64.html#method.log10
| date_type.clone(), | ||
| ])); | ||
| // Signatures with `initial_guess` argument; only [`DataType::Float64`] is accepted | ||
| const INITIAL_GUESS_TYPE: DataType = DataType::Float64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - I'd lift INITIAL_GUESS_TYPE closer to NUMERIC_TYPES
| } | ||
|
|
||
| fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { | ||
| let payments = cast(&values[0], &DataType::Float64)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that implementation always casts payments to Float64 would not be enough to just declare first argument as Float64 in signature, and rely on DF to insert coercion in calls?
Same for dates and on_errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcheshkov DF would throw errors when using non-specified types in this case. If, say, on_error could only accept Float64, then specifying argument as 0 instead of 0.0 would produce Int32 and would error on logical plan building. I wanted to avoid forcing the user to explicitly cast all the arguments, especially with dates, considering Cube's time type is Timestamp.
With signature, there doesn't seem to be any automatic coercion, but I might be wrong.
| self.add_pair(payment, date)?; | ||
| } | ||
| let values_len = values.len(); | ||
| if values_len < 3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - feels like it could be a bit simpler with values.get(2).map(...)
| Arc::new(|| Ok(Box::new(XirrAccumulator::new()))); | ||
| let state_type: StateTypeFunction = Arc::new(|_| { | ||
| Ok(Arc::new(vec![ | ||
| DataType::List(Box::new(Field::new("item", DataType::Float64, true))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use different field names here for different components of state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The item field name seems to be static for DataType::List in DF. Looking through the code, there doesn't seem to be any List where the field name would differ; I believe different field names are for maps or similar structures.
| for (payment, date) in payments.into_iter().zip(dates) { | ||
| self.add_pair(payment, date)?; | ||
| } | ||
| let states_len = states.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check different states length here, it should always be 4, as returned from Accumulator::state()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. I also noticed this mistake when extending the function with optional arguments, and then forgot to fix it before publishing the PR. I'll get rid of those checks in some future chore.
| const MAX_ITERATIONS: usize = 100; | ||
| const TOLERANCE: f64 = 1e-6; | ||
| const DEFAULT_INITIAL_GUESS: f64 = 0.1; | ||
| let Some(min_date) = self.pairs.iter().map(|(_, date)| *date).min() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - IMO min_by_key() looks nicer
| } | ||
| let rate_positive = 1.0 + rate_of_return; | ||
| let denominator = rate_positive.powf(*year_difference); | ||
| net_present_value += *payment / denominator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this can be more accurate (and, probably, performant) with FMA: net_present_value = payment.mul_add(denominator.recip(), net_present_value);
IDK if LLVM can do this for us, but I think it would not.
And same for derivative_value, but I'm not sure how to handle multiple multiplication
| if *payment == 0.0 { | ||
| continue; | ||
| } | ||
| let rate_positive = 1.0 + rate_of_return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - rate_positive is same for all pairs, can be lifter out of inner loop
| let rate_positive = 1.0 + rate_of_return; | ||
| let denominator = rate_positive.powf(*year_difference); | ||
| net_present_value += *payment / denominator; | ||
| derivative_value -= *year_difference * *payment / denominator / rate_positive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation - *year_difference * *payment is same for pair, and does not change between iterations, so we can trade memory for CPU here
Check List
Description of Changes Made
This PR adds support for
XIRRaggregate function. Related test is included.