Skip to content

Conversation

Xitian9
Copy link
Collaborator

@Xitian9 Xitian9 commented Sep 12, 2025

This eliminates the last few error calls that can occur in calculating the report spans of a multi-balance report.

  1. We introduce a newtype wrapper DayPartition for PeriodData Day. By not exporting the constructor, we can ensure that all DayPartitions are well-formed.
  2. splitSpan is moved into Hledger.Data.DayPartition and now produces Maybe DayPartition directly, rather than passing through [DateSpan].

That said, this introduced a fair amount of code churn in order to eliminate the last few error calls. If you feel that the benefit is not worth the cost, let me know.

@Xitian9 Xitian9 force-pushed the splitSpan branch 2 times, most recently from 169fdf7 to 97a0494 Compare September 13, 2025 07:24
@simonmichael
Copy link
Owner

I'll hold this one for now as I don't have an open bug that needs it.

If you have time, #2454 would be a great one to look at next for 1.50.1.

@simonmichael
Copy link
Owner

Hola @Xitian9..

This PR has two commits, so I don't understand git log's showing me just one when I check out this branch:

  • 97a0494 (grafted, HEAD -> splitSpan, Xitian9/splitspan, pr/Xitian9/2457) dev!: lib: Refactor splitSpan to return Maybe DayPartition. (Stephen Morgan, Sat Sep 13 09:24)

could you rebase when you have a chance.

@Xitian9
Copy link
Collaborator Author

Xitian9 commented Oct 2, 2025

Rebased.

_ -> d


intToDay = ModifiedJulianDay . toInteger
Copy link
Owner

@simonmichael simonmichael Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Modified Julian Day is a standard count of days, with zero being the day 1858-11-17.

I think this is from an earlier PR, but it sounds like it might not fit with our support of all years ? (CE currently, maybe BCE some day)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is from an earlier commit, and yes it technically does not support all years.

  • On a 64-bit system this will support all years between 252 quadrillion BC (well before the development of currency) and 252 quadrillion AD (at which point we'll be more worried about powering our computers since all stars will have become black holes or white dwarfs).
  • On a 32-bit system, it will support all years between 5 877 751 BC and 5 881 468 AD.

We could extend support to all integers. This would incur a small performance penalty as we'd have to switch from IntMap to Map internally. I personally think the supported date ranges are fine, but happy for you to make a call.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. So 0 to 1858 is not a problem then. Great.

Copy link
Owner

@simonmichael simonmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI gave this overview of the latest date types:

Date-related Types Overview

Basic date types (Types.hs:103-123):

  • EFDay - exact or flexible date (Exact Day | Flex Day)
  • DateSpan - optional start/end dates (DateSpan (Maybe EFDay) (Maybe EFDay))

Report period types (Types.hs:129-158):

  • Period - high-level periods (DayPeriod, MonthPeriod, etc.)
  • Interval - report intervals (Days, Weeks, Months, Quarters, Years, etc.)

Period data structures:

  • PeriodData a (Types.hs:756) - data for report periods + pre-period
    • pdpre :: a - historical/pre-report data
    • pdperiods :: IntMap a - data per period (Day as IntMap key)
  • DayPartition (DayPartition.hs:37) - well-formed time partition
    • Newtype wrapping PeriodData Day
    • Opaque constructor enforces contiguous spans
    • Invariants: non-empty, contiguous, end = start-1 of next

Key functions:

  • splitSpan :: Interval -> DateSpan -> Maybe DayPartition - split span into
    periods
  • lookupDayPartition :: Day -> DayPartition -> (Maybe Day, Day) - find period
    containing day
  • dayPartitionToDateSpans :: DayPartition -> [DateSpan] - convert to list of
    spans

Does it sound about right to you ?

DayPartition is PeriodData with mandatory smart constructors to guarantee invariants. Is it valuable to still have the separate PeriodData type ?

If so we should probably expand the description of both types, making things more approachable.

I have installed this locally and will try to exercise it as well.

@simonmichael
Copy link
Owner

Now also available as https://github.com/simonmichael/hledger/releases/tag/nightly binaries.

@Xitian9
Copy link
Collaborator Author

Xitian9 commented Oct 2, 2025

Yes, I think that description is basically accurate, though I think only splitSpan winds up being a key function in practice.

We would still need the separate PeriodData constructor, as it also holds balance information in the Account data type. I'll have a go at expanding the descriptions.

This allows us to guarantee that the report periods are well-formed and
don't contain errors (e.g. empty spans, spans not contiguous, spans not
a partition).

Note the underlying representation is now for disjoint spans, whereas
previously the end date of a span was equal to the start date of the
next span, and then was adjusted backwards one day when needed.
This eliminates all error calls from the chain calculating report
periods.
--
-- Data is stored in an 'IntMap' for efficiency, where Days are stored as as
-- Int representing the underlying modified Julian date.
data PeriodData a = PeriodData {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind replacing that last line with a mention of how the keys are "days since 1858-whatever".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, disregard; I am having trouble viewing the latest version of changes with VS Code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, let's spell out how exactly the Int key represents the date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best place to do that, as that's more or less an internal detail of the Data.Time module. In the unlikely event that they decide to change their internal representation, our documentation would be out of date.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, a pointer to the authoritative doc would be fine then. So a person wanting to understand this type can do so, right ?

--
-- Data is stored in an 'IntMap' for efficiency, where Days are stored as as
-- Int representing the underlying modified Julian date.
data PeriodData a = PeriodData {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, let's spell out how exactly the Int key represents the date.

-- This is a newtype wrapper around 'PeriodData Day', where the start dates are
-- the keys and the end dates are the values. Spans are stored in inclusive format
-- [start, end]. Note that this differs from 'DateSpan' which uses [start, end)
-- format.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for expanding these descriptions. I think we can do better still; as a person reading this, not familiar with PeriodData (or even if I am), I find it not too clear. "Start dates are keys, end dates are values." I don't see it.

Copy link
Owner

@simonmichael simonmichael Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Claude for some docs making it more approachable to contributors. It gave things like this which I found helpful - except for the description of the "pre-period date". Or is this slop ? Apologies if so.

  -- | A partition of time into contiguous non-overlapping periods.
  --
  -- Each period is a consecutive span of days with an inclusive start date
  -- and inclusive end date. Periods must be:
  --
  -- * Contiguous: period N's end date is exactly one day before period N+1's start date
  -- * Ordered: periods are in chronological order
  -- * Non-empty: at least one period exists
  --
  -- The constructor is hidden to maintain these invariants. Use 'boundariesToDayPartition'
  -- or 'splitSpan' to construct.
  --
  -- Internally, a DayPartition wraps 'PeriodData Day', which stores:
  -- 
  --   * A map of period start dates to period end dates (as IntMap Day for efficiency)
  --   * A "pre-period" historical date (one day before the first period starts)
  --
  -- ==== Examples
  -- >>> let Just p = boundariesToMaybeDayPartition [fromGregorian 2024 01 01, fromGregorian 2024 02 01, fromGregorian 2024 03 01]
  -- >>> dayPartitionToList p
  -- [(2024-01-01,2024-01-31),(2024-02-01,2024-02-29)]
  newtype DayPartition = ...
  -- Internal invariant checker (not exported)
  -- Verifies that a DayPartition maintains the required properties.
  -- All smart constructors should produce values that pass this check.
  isValidDayPartition :: DayPartition -> Bool

-- 1. The value stored in pdperiods has at least one key.
-- 2. The value stored in pdpre equals one day before the smallest key in pdperiods.
-- 3. The value stored in each entry of pdperiods equals one day before the
-- next largest key, except for the value associated to the largest key.
Copy link
Owner

@simonmichael simonmichael Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who needs to follow these rules - a user of this module ? When using which functions ? Would it be better to mention these as invariants of DayPartition, in DayPartition's doc ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, users of this module do not need to know these details. The only people who need to know are those who make constructors for DayPartition, as they need to make sure their constructors satisfy these invariants. After that point they should be invisible to users, and can interact with DayPartition through the provided interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's worth mentioning that that means only people who make commits here in this module need to know (and even not most of them).

@simonmichael
Copy link
Owner

I'm going to merge this and maybe follow up with another pass at haddocks. Thank you!

@simonmichael simonmichael merged commit 4e9fa16 into simonmichael:master Oct 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants