Skip to content

Create bib. sections using an automated search for referenced citations#4700

Draft
Copilot wants to merge 47 commits intomainfrom
copilot/remove-findallcitations-function
Draft

Create bib. sections using an automated search for referenced citations#4700
Copilot wants to merge 47 commits intomainfrom
copilot/remove-findallcitations-function

Conversation

Copy link
Contributor

Copilot AI commented Jan 28, 2026

Closes #4699

Blocked on #4731

Avoid using findAllCitations to gather bibliography entries.

Why? findAllCitations gets all instances of Citations from a ChunkDB which means we gather more than we need for the documents.

This PR uses Set whenever possible, but does not clean up code related to the existing 'gather all referenced symbols' code for the table of symbols creation. That should be done in a separate PR.

Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove Drasil.Database.SearchTools' findAllCitations Remove findAllCitations and scan documents for referenced citations Jan 28, 2026
Copilot AI requested a review from balacij January 28, 2026 19:28
balacij

This comment was marked as outdated.

Previously only extracted citations from Ref constructors in sentences.
Now also extracts from DecRefs stored in DataDefinition, GenDefn,
InstanceModel, and TheoryModel structures. This fixes missing citations
like koothoor2013 in SWHSNoPCM which are referenced via dRefInfo in
InstanceModels.

Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>

This comment was marked as outdated.

Copilot AI requested a review from balacij January 28, 2026 23:11
Copilot seems to forget to add dependencies...
balacij

This comment was marked as outdated.

Pull fixes from b76a720 that implement thorough sentence extraction:
- Extract sentences from IOrgSec's Section parameter
- Extract from PDSub subsections
- Extract from CorrSolnPpties and FReqsSub LabelledContent
- Extract from Defini and Graph comprehensively
- Add helper functions getIntroSub, getPDSub, getC, getLC

Revert GetChunks back to using findAllCitations - the plate-based
approach was incomplete. The real issue was missing sentence extraction
in ExtractDocDesc, which is now fixed.

Remove ExtractNotebook.hs as it's no longer needed.

Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>

This comment was marked as outdated.

Copilot AI requested a review from balacij January 28, 2026 23:28
…xtraction

- Re-add getCitations, citeDB, and modelRefPlate functions to ExtractDocDesc
- Restore ExtractNotebook.hs for notebook citation extraction
- Update GetChunks to export lookupCitations helper (not findAllCitations)
- Keep comprehensive sentence extraction improvements from upstream
- Remove commented ciPlate code as requested

This combines the plate-based citation extraction approach with the improved
sentence traversal logic, ensuring all citations are found while maintaining
the goal of only including references actually used in documents.

Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>
Copilot AI and others added 2 commits January 28, 2026 23:48
The issue was that some citations (like smithLai2005 in orgOfDocIntro) are
only created when DocDesc is converted to Sections during mkSections. The
previous extraction only scanned DocDesc, missing these citations.

Solution:
- Added getCitationsFromSections to extract from generated Section list
- Added citeDBFromSections to look up citations from sections
- Modified fillReferences to extract from both DocDesc AND generated sections
- Updated mkSections to accept optional BibRef for second pass
- In mkDoc, compute full bibliography after fillReferences and pass to mkSections

This ensures all citations are found, including those in sentences generated
during document construction like the organization section intro.

Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>
Co-authored-by: balacij <1627302+balacij@users.noreply.github.com>
@balacij balacij marked this pull request as ready for review February 4, 2026 19:25
@balacij
Copy link
Collaborator

balacij commented Feb 4, 2026

@JacquesCarette @smiths This PR is ready for review now. Thanks!

@Xinlu-Y If you don't mind reviewing it as well, that would be appreciated!

@balacij balacij requested a review from Xinlu-Y February 4, 2026 19:30
getContList = concatMap getCon'

-- | Extracts 'Sentence's from raw content.
getCon :: RawContent -> [Sentence]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called getCon when it extracts sentences? getSent or extractSent would be better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in b964415.

in l : ls ++ rs
getCon (Defini _ ics) = concatMap (concatMap getCon' . snd) ics

-- | Get the bibliography from something that has a field.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBib should return [CiteField] and not prematurely extract the inner sentences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/JacquesCarette/Drasil/pull/4700/changes/BASE..8fe8cc2c8e88f12a66f1ac35e2cc97dfc2bbb5ce#r2774381197

With your other comment in mind, it ended up being that getBib could be removed entirely.

getBib a = map getField $ concatMap (^. getFields) a

-- | Unwraps a 'CiteField' into a 'Sentence'.
getField :: CiteField -> Sentence
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called getField? Plus, why does it extract those pieces and not others? This seems like a very weird piece of code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never really looked into this code closely, I just moved it here.

It's useless.

getField is used exclusively in getBib. getBib is used exclusively in getCon and and egetCon. In both situations, the extracted "sentences" are always of the form S s, where s :: String, i.e., where there are no UIDs or ModelExprs.

We can remove getField and getBib code without affecting stable.

Removed in 6b1aca8 .

getField Volume{} = EmptyS
getField Year{} = EmptyS

-- | Translates different types of lists into a 'Sentence' form.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next 3 functions: why is all this flattening needed? Who is the consumer of this? This seems weird.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 functions are used in getCon. getCon is used for finding all referenced (a) all chunks (looking specifically for citations here and things mentioned with their abbreviations in #4299) and (b) all units referenced in an about-to-be-rendered SRS and/or Lesson Plan.

Yes, this code is all very fishy. That's why I've been very adamant on #3864, but maybe I was wrong about that direction specifically. Perhaps using HasChunkRefs is our real solution to the greater issue of knowing what chunks are referenced by "atoms" and "molecules". However, HasChunkRefs exposes a list of UIDs, which are meant to be "as internal as possible" to drasil-database and chunk types. They should no longer be used as "chunk references." So, either UIDRefs or UnitypedUIDRefs would need to be returned by HasChunkRefs somehow, preferrably retaining type information in both for when code such as this and the aforementioned areas. Are we okay for renderers to "know" that much about specific types? I'm not sure. Is there a real design discussion to be had here? Yes, for sure.

I would like to return to #2895 soon as well, which is useful to think about in the context of a re-design of how the renderers work and how we should know about what chunks/things are referenced within a produced document.

Copy link
Collaborator

@balacij balacij Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all the functions into extractSents's where-block in 9f3627d

import Data.List (nub, sortBy)
import qualified Data.Set as Set
import Data.Maybe (mapMaybe)
import qualified Data.Set as S
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I rather prefer Set over S !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in 9b07be7

It has some benefits. I would like to see the type name unqualified, but I don't like adding two imports from the same module.

<div class="section">
<h1>References</h1>
<dl class="reference-list">
<dt id="jfBeucheIntro">[<b>jfBeucheIntro</b>]</dt>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this reference to a Physics book is no longer needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #4731

<dd>
Wikipedia Contributors. <em>Damping</em>. July, 2019. <a href="https://en.wikipedia.org/wiki/Damping_ratio">https://en.wikipedia.org/wiki/Damping_ratio</a>.
</dd>
<dt id="sciComp2013">[<b>sciComp2013</b>]</dt>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #4731

@balacij
Copy link
Collaborator

balacij commented Feb 6, 2026

Because of the unexpected, removed bib. entries from GamePhysics, I will mark this PR as a draft (so we don't merge it accidentally) until the entries are investigated in #4731.

@balacij balacij marked this pull request as draft February 6, 2026 16:18
@Xinlu-Y
Copy link
Collaborator

Xinlu-Y commented Feb 9, 2026

Thanks for cleaning up the citation extraction structure! I’m handling the follow-up separately via case-local citation registration. Would you mind taking a look at #4746?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drasil.Database.SearchTools: Remove findAllCitations

4 participants