This document outlines the strategy for parsing the NodeBook CNL. We will follow a strict Test-Driven Development (TDD) model. For each function, we will first write a test that defines the expected output, and only then will we write the implementation.
Core Principle: Write small, focused utility functions that do one thing and do it well.
The first step is to parse the raw CNL text into a structured tree that represents the visual hierarchy of the document. This is the most critical step.
- Function:
buildStructuralTree(cnlText) - Input: The raw CNL string.
- Output: An array of
NodeBlockobjects. EachNodeBlockwill have the following structure:{ "heading": "# Hydrogen [Element]", "description": "A chemical element...", "content": [ "has number of protons: 1;" ], "morphs": [ { "heading": "## Hydrogen ion", "description": null, "content": [ "has charge: 1;", "<part of> Water;" ] } ] }
Once we have the structuralTree, we will walk it in two passes to generate a flat list of operations.
This pass ensures that all entities exist before we try to connect them.
- Function:
generateNodeAndMorphOps(structuralTree) - Input: The
structuralTreefrom the previous step. - Output: An array of
addNodeandaddMorphoperations.addNodePayload:{ base_name: "Hydrogen", options: { id: "hydrogen", role: "Element", ... } }addMorphPayload:{ nodeId: "hydrogen", morphName: "Hydrogen ion" }
This pass connects the entities created in the first pass.
- Function:
generateNeighborhoodOps(structuralTree) - Input: The
structuralTree. - Output: An array of
addAttributeandaddRelationoperations.addAttributePayload:{ source: "hydrogen", name: "charge", value: "1", options: { morph: "Hydrogen ion" } }addRelationPayload:{ source: "hydrogen", target: "water", name: "part of", options: { morph: "Hydrogen ion" } }
The main functions above will be supported by a set of small, testable utility functions.
- Function:
processNodeHeading(headingLine) - Input: A string, e.g.,
# **Red** Car [Vehicle] - Output:
{ id: "red_car", type: "Vehicle", payload: { ... } }
-
Function:
parseAttribute(attributeString) -
Input: A string, e.g.,
has number of protons: 1; -
Output:
{ name: "number of protons", value: "1" } -
Function:
parseRelation(relationString) -
Input: A string, e.g.,
<part of> Water; -
Output:
{ name: "part of", targets: ["Water"] }
This is a separate concern from parsing the CNL content itself, but it is an important part of the overall file format. We will handle this separately after the core parser is complete.
This is our finalized plan. I will now proceed with the first step: writing the test for the buildStructuralTree function.