Skip to content

Fill chunk refs#4434

Open
Xinlu-Y wants to merge 47 commits intomainfrom
fill-chunk-refs
Open

Fill chunk refs#4434
Xinlu-Y wants to merge 47 commits intomainfrom
fill-chunk-refs

Conversation

@Xinlu-Y
Copy link
Collaborator

@Xinlu-Y Xinlu-Y commented Nov 4, 2025

#4349

  • Implement real HasChunkRefs lookups for ConceptChunk, ConceptInstance, and DefinedQuantityDict so referenced UIDs flow up from definitions and short names

  • Rework the base chunk DB to seed the few math citations up front, drop duplicate concept chunks pulled in via DQDs, and keep citation UIDs out of the reference table

  • Add inReqDesc and thread it through every SRS so the input-values requirement now references the actual table; this includes injecting the table into each LabelledContent list and putting SRS.sectionReferences into allRefs

  • Minor cleanups triggered by the new wiring, such as reordering SI derived units and adjusting assumption lists so the generated refs resolve

Copy link
Owner

@JacquesCarette JacquesCarette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be multiple logically independent sets of changes in this PR. Why all in here? It feels like this should be broken up into smaller PRs.

derived = [becquerel, calorie, centigrade, coulomb, farad, gray, henry, hertz, joule,
katal, kilopascal, kilowatt, litre, lumen, lux, millimetre, newton, ohm,
pascal, radian, siemens, sievert, steradian, tesla, volt, watt, weber]
derived = [becquerel, centigrade, coulomb, hertz, joule, katal, litre,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have these changed order? We should make such changes unless there is a good reason to. Previous were in strict alphabetical order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this change ensures every derived unit is inserted after the units it depends on.
For example, pascal precedes kilopascal, volt precedes farad/ohm, and weber precedes tesla/henry.

-- | Internal function to insert a chunk into the 'ChunkDB'. This function
-- assumes that the chunk is not already registered in the database, and quietly
-- break table synchronicity if it is.
insert0 :: IsChunk a => ChunkDB -> a -> ChunkDB
checks that all referenced units already exist when a new unit is inserted. The old order triggered missing-dependency errors.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent reason, thank you! You should put in a comment to that effect. I would make sure that the base units are in alphabetical order, and then by dependency (+alphabetical) order after that.

standardValuesDesc mainIdea = foldlSent [atStartNP' (the value), S "provided in",
refS $ SRS.valsOfAuxCons ([]::[Contents]) ([]::[Section]), S "are assumed for the", phrase mainIdea,
sParen (ch mainIdea) `sC` S "and the", plural materialProprty `S.of_`
foldlList Comma List (map (ch . view defLhs) assumptionConstants)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take 3 was a hack, but I'm concerned that this will actually change stable?

Constraints EmptyS inputsUC]],

ReqrmntSec $ ReqsProg [FReqsSub EmptyS [], NonFReqsSub], LCsSec,
ReqrmntSec $ ReqsProg [FReqsSub inputValuesDescription [], NonFReqsSub], LCsSec,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this is a good change, but it is going to lead to changes in stable. So it is usually best to do such changes in isolation (even if it means five 1-line change PRs) instead of being bundled in a larger PR.

-- | Holds all references and links used in the document.
allRefs :: [Reference]
allRefs = [externalLinkRef]
allRefs = externalLinkRef : SRS.sectionReferences ++ map ref labelledContentWithInputs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this change, which is likely also good, will also change stable. We want to review these as separately as possible.

assumptions = [twoDMotion, cartSyst, yAxisGravity, launchOrigin, targetXAxis,
posXDirection, constAccel, accelXZero, accelYGravity, neglectDrag, pointMass,
freeFlight, neglectCurv, timeStartZero, gravAccelValue]
assumptions = [twoDMotion, neglectCurv, cartSyst, yAxisGravity, accelYGravity,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the reorder?

Copy link
Collaborator Author

@Xinlu-Y Xinlu-Y Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorder the assumptions so that each assumption appears after any other assumptions it refers to in its description.

In Assumptions.hs, some assumptions use fromSource or fromSources to refer to other assumptions.
These functions automatically create clickable links in the generated SRS document.
However, in the old order, some of those links pointed forward to assumptions that hadn’t been listed yet, so the links appeared but didn’t work when clicked.

The new order ensures that all referenced assumptions are introduced before they are linked.
For example:

  • neglectCurv now comes before cartSyst
  • accelXZero, accelYGravity, neglectDrag, and freeFlight now come before constAccel

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Please add a comment to that effect. (And everywhere else you did this.)

assumptions :: [ConceptInstance]
assumptions = [assumpTEO, assumpHTCC, assumpCWTAT, assumpTPCAV, assumpDWPCoV, assumpSHECoV,
assumpLCCCW, assumpTHCCoT, assumpTHCCoL, assumpLCCWP, assumpCTNOD, assumpSITWP,
assumpLCCCW, assumpTHCCoT, assumpTHCCoL, assumpLCCWP, assumpSITWP, assumpCTNOD,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-order?

-- nounPhrases instead of strings
-- Another capitalization hack.
progName' = commonIdea "swhsPCM" (nounPhrase''
(S "solar water heating systems incorporating PCM")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part (merging "incorporation") is fine, but inlining PCM is not a good change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll revert the inlined PCM.

[algorithm, errMsg, program] ++ srsDomains ++ mathcon

basisCitations :: [Citation]
basisCitations = [cartesianWiki, lineSource, pointSource]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those citations are in basisCitations because the base math concepts: cartesian, line, and point already hard-code those references through fromSource.
So once any of those concepts are pulled into a ChunkDB, the matching citations have to be there too, otherwise the generator complains about missing references.

Putting them in basisCitations guarantees everything builds cleanly, but it’s a bit clunky. I have to manually update that global list. maybe there’s a better way to handle this. I’d love to hear from you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so they're requirements of other chunks already in the basisChunkDB then? That sounds sensible to me then. This one could probably be pulled out into its own PR.

where
baseDB = basisCDB { labelledcontentTable = idMap lc,
refTable = idMap r }
withIdeaDicts = insertAll t baseDB
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you would find this more readable... if you are going to make this change, at least line up the = signs!

I don't actually think this is really an improvement though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous form made it very hard to track the registration order of different chunk types.
even a slight ordering mistake could trigger the “Missing dependency” errors I kept running into.
During this round of fixes, I repeatedly ran into such issues:

  • GlassBR: assumpGC referenced astm2009, but the citations were inserted later.
  • SWHS / SWHSNoPCM / PDController / SSP: functional requirements were registered before ReqInputs or the data-constraint tables they reference.
  • Projectile: constAccel cited assumptions that weren’t yet in the database.
  • DoublePendulum: inputValues referred to a table that hadn’t been inserted.

To fix these missing-chunk errors, I think i had to keep verifying which kinds of chunks were inserted first.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. This is likely to happen again.

I think the conclusion is that inserting into the chunk database by chunk type is very fragile. We're lucky it worked at all. Please create an issue for this: I think we should change our mechanism for insertion to be more dependency-aware. But that's a much bigger change, so should not be done now. So your changes here can stay, as they'll end up being moot when this gets changed.

@Xinlu-Y
Copy link
Collaborator Author

Xinlu-Y commented Nov 6, 2025

I’m splitting this PR into smaller ones for easier review, stable updates will follow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we use insertAllOutOfOrder12, the first call to cdb … concIns labCon … registers both the requirement (instance:inputValues) and its table (doc:ReqInputs) together, so the dependency is satisfied.
But when fillReqs runs later, it calls insertAll again for the concept instances only, without re-inserting their labelled content.
At that point insertRefExpectingExistence looks in the chunkTable for doc:ReqInputs, can’t find it, and throws error.

So here I just skip fillReqs when the system already provides its own requirements to avoid the duplicate insertion and keep the dependency check happy.

As a result, the requirement order in the chunk table becomes existing FRs first, followed by new NFRs

@Xinlu-Y
Copy link
Collaborator Author

Xinlu-Y commented Nov 13, 2025

This PR currently includes too many commits, including merges and some testing from @balacij‘s branch. I’ll split them up later to make the review easier.

@Xinlu-Y Xinlu-Y marked this pull request as ready for review November 18, 2025 16:12
Comment on lines +264 to +270
-- Calculate what chunks are depended on (i.e., UID -> Dependants), but only
-- keep dependencies that correspond to chunks we are actually inserting or
-- that already exist in the chunk table. References/LabelledContent live in
-- separate tables and are handled elsewhere.
allowedDeps = S.fromList (M.keys (chunkTable strtr) ++ map (^. uid) calt)
chDpdts = invert $ M.fromList $
map (\c -> (c ^. uid, S.toList $ S.filter (`S.member` allowedDeps) (chunkRefs c))) calt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I'm understanding this change. Does it not make it so that any chunk dependancy that is neither already in the ChunkDB nor in the list of chunks being added, is ignored? I think this would mean that we can insert chunks into the ChunkDB that have unsatisfied dependencies?

-- | Holds all references and links used in the document.
allRefs :: [Reference]
allRefs = [externalLinkRef]
allRefs = externalLinkRef : SRS.sectionReferences ++ map ref (labelledContent ++ funcReqsTables)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this really needed? The fillcdbSRS function should be doing this already for any LabelledContent registered in the ChunkDB. (I'm not saying it should be doing that, only that it currently does that.)

Comment on lines +79 to +83
chunkRefs ci =
let conceptRefs = chunkRefs (ci ^. cc)
shortNameRefs = collectSentenceRefs (getSentSN (shortname ci))
domainRefs = S.fromList (cdom ci)
in conceptRefs `S.union` shortNameRefs `S.union` domainRefs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good! And your code now makes me think: Can we instantiate HasChunkRefs for Sentence? I think that would help simplify your code and make it so that you don't need to import anything here. It would only be union of the 3 lists of chunkRefs -- e.g.,

              chunkRefs (ci ^. cc)
    `S.union` (S.fromList (cdom ci))
    `S.union` chunkRefs (ci ^. defn)
    
...
(elsewhere)...

instance HasChunkRefs Sentence where
  chunkRefs s = ...

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this makes me think: if all of this code is going to be more or less following the same structure, I think we can use deriving-via to auto-generate these instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants