-
-
Notifications
You must be signed in to change notification settings - Fork 341
Eliminate error calls in report periods #2457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
169fdf7
to
97a0494
Compare
I'll hold this one for now as I don't have an open bug that needs it. If you have time, #2454 would be a great one to look at next for 1.50.1. |
Hola @Xitian9.. This PR has two commits, so I don't understand git log's showing me just one when I check out this branch:
could you rebase when you have a chance. |
Rebased. |
_ -> d | ||
|
||
|
||
intToDay = ModifiedJulianDay . toInteger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Modified Julian Day is a standard count of days, with zero being the day 1858-11-17.
I think this is from an earlier PR, but it sounds like it might not fit with our support of all years ? (CE currently, maybe BCE some day)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is from an earlier commit, and yes it technically does not support all years.
- On a 64-bit system this will support all years between 252 quadrillion BC (well before the development of currency) and 252 quadrillion AD (at which point we'll be more worried about powering our computers since all stars will have become black holes or white dwarfs).
- On a 32-bit system, it will support all years between 5 877 751 BC and 5 881 468 AD.
We could extend support to all integers. This would incur a small performance penalty as we'd have to switch from IntMap to Map internally. I personally think the supported date ranges are fine, but happy for you to make a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. So 0 to 1858 is not a problem then. Great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI gave this overview of the latest date types:
Date-related Types Overview
Basic date types (Types.hs:103-123):
- EFDay - exact or flexible date (Exact Day | Flex Day)
- DateSpan - optional start/end dates (DateSpan (Maybe EFDay) (Maybe EFDay))
Report period types (Types.hs:129-158):
- Period - high-level periods (DayPeriod, MonthPeriod, etc.)
- Interval - report intervals (Days, Weeks, Months, Quarters, Years, etc.)
Period data structures:
- PeriodData a (Types.hs:756) - data for report periods + pre-period
- pdpre :: a - historical/pre-report data
- pdperiods :: IntMap a - data per period (Day as IntMap key)
- DayPartition (DayPartition.hs:37) - well-formed time partition
- Newtype wrapping PeriodData Day
- Opaque constructor enforces contiguous spans
- Invariants: non-empty, contiguous, end = start-1 of next
Key functions:
- splitSpan :: Interval -> DateSpan -> Maybe DayPartition - split span into
periods - lookupDayPartition :: Day -> DayPartition -> (Maybe Day, Day) - find period
containing day - dayPartitionToDateSpans :: DayPartition -> [DateSpan] - convert to list of
spans
Does it sound about right to you ?
DayPartition is PeriodData with mandatory smart constructors to guarantee invariants. Is it valuable to still have the separate PeriodData type ?
If so we should probably expand the description of both types, making things more approachable.
I have installed this locally and will try to exercise it as well.
Now also available as https://github.com/simonmichael/hledger/releases/tag/nightly binaries. |
Yes, I think that description is basically accurate, though I think only We would still need the separate |
This allows us to guarantee that the report periods are well-formed and don't contain errors (e.g. empty spans, spans not contiguous, spans not a partition). Note the underlying representation is now for disjoint spans, whereas previously the end date of a span was equal to the start date of the next span, and then was adjusted backwards one day when needed.
This eliminates all error calls from the chain calculating report periods.
-- | ||
-- Data is stored in an 'IntMap' for efficiency, where Days are stored as as | ||
-- Int representing the underlying modified Julian date. | ||
data PeriodData a = PeriodData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind replacing that last line with a mention of how the keys are "days since 1858-whatever".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, disregard; I am having trouble viewing the latest version of changes with VS Code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, let's spell out how exactly the Int key represents the date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the best place to do that, as that's more or less an internal detail of the Data.Time
module. In the unlikely event that they decide to change their internal representation, our documentation would be out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, a pointer to the authoritative doc would be fine then. So a person wanting to understand this type can do so, right ?
-- | ||
-- Data is stored in an 'IntMap' for efficiency, where Days are stored as as | ||
-- Int representing the underlying modified Julian date. | ||
data PeriodData a = PeriodData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, let's spell out how exactly the Int key represents the date.
-- This is a newtype wrapper around 'PeriodData Day', where the start dates are | ||
-- the keys and the end dates are the values. Spans are stored in inclusive format | ||
-- [start, end]. Note that this differs from 'DateSpan' which uses [start, end) | ||
-- format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for expanding these descriptions. I think we can do better still; as a person reading this, not familiar with PeriodData (or even if I am), I find it not too clear. "Start dates are keys, end dates are values." I don't see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked Claude for some docs making it more approachable to contributors. It gave things like this which I found helpful - except for the description of the "pre-period date". Or is this slop ? Apologies if so.
-- | A partition of time into contiguous non-overlapping periods.
--
-- Each period is a consecutive span of days with an inclusive start date
-- and inclusive end date. Periods must be:
--
-- * Contiguous: period N's end date is exactly one day before period N+1's start date
-- * Ordered: periods are in chronological order
-- * Non-empty: at least one period exists
--
-- The constructor is hidden to maintain these invariants. Use 'boundariesToDayPartition'
-- or 'splitSpan' to construct.
--
-- Internally, a DayPartition wraps 'PeriodData Day', which stores:
--
-- * A map of period start dates to period end dates (as IntMap Day for efficiency)
-- * A "pre-period" historical date (one day before the first period starts)
--
-- ==== Examples
-- >>> let Just p = boundariesToMaybeDayPartition [fromGregorian 2024 01 01, fromGregorian 2024 02 01, fromGregorian 2024 03 01]
-- >>> dayPartitionToList p
-- [(2024-01-01,2024-01-31),(2024-02-01,2024-02-29)]
newtype DayPartition = ...
-- Internal invariant checker (not exported)
-- Verifies that a DayPartition maintains the required properties.
-- All smart constructors should produce values that pass this check.
isValidDayPartition :: DayPartition -> Bool
-- 1. The value stored in pdperiods has at least one key. | ||
-- 2. The value stored in pdpre equals one day before the smallest key in pdperiods. | ||
-- 3. The value stored in each entry of pdperiods equals one day before the | ||
-- next largest key, except for the value associated to the largest key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who needs to follow these rules - a user of this module ? When using which functions ? Would it be better to mention these as invariants of DayPartition, in DayPartition's doc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, users of this module do not need to know these details. The only people who need to know are those who make constructors for DayPartition
, as they need to make sure their constructors satisfy these invariants. After that point they should be invisible to users, and can interact with DayPartition
through the provided interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's worth mentioning that that means only people who make commits here in this module need to know (and even not most of them).
I'm going to merge this and maybe follow up with another pass at haddocks. Thank you! |
This eliminates the last few error calls that can occur in calculating the report spans of a multi-balance report.
DayPartition
forPeriodData Day
. By not exporting the constructor, we can ensure that allDayPartition
s are well-formed.splitSpan
is moved intoHledger.Data.DayPartition
and now producesMaybe DayPartition
directly, rather than passing through[DateSpan]
.That said, this introduced a fair amount of code churn in order to eliminate the last few error calls. If you feel that the benefit is not worth the cost, let me know.