Agent Guidelines for writings Repository

Build/Lint/Test Commands

just check - Check code (runs cargo check --all-targets --all-features)
just test - Run all tests (runs cargo test --all-targets --all-features)
just fix - Auto-fix code (runs cargo fix --allow-dirty --allow-staged)
just clean - Clean build artifacts

Code Style Guidelines

Edition: Rust 2024
Formatting:
- ALWAYS inline format args (e.g. info!("inline {display} {debug:?}"))
- ALWAYS prefer guard clauses and early returns/continues/breaks to nesting if/else
Imports: Use workspace dependencies when available
Serialization: Serde with camelCase JSON naming via #[serde(rename_all = "camelCase")]
Error Handling: Use wherror for custom error types, Result<T, WritingsError> alias
Documentation: Comprehensive doc comments with examples
Tests: Inline in modules with #[cfg(test)], use #[test] attributes
Features: Use feature flags for optional functionality (embed-all, poem, utoipa, etc.)

Visitors Technical Overview

The writings are parsed from bahai.org HTML sources using a visitor pattern. Each writing type (HiddenWords, Gleanings, Prayers, etc.) implements the WritingsVisitor trait:

Core Trait: WritingsVisitor with visit() method returning VisitorAction (VisitChildren, SkipChildren, Stop)
HTML Parsing: Uses scraper crate with CSS selectors to navigate DOM structure
Class-based Matching: Visitors match HTML elements by CSS classes (e.g., zd hb for prologues, dd zd for content)
State Management: Each visitor maintains parsing state (current section, paragraph count, etc.)
Citation Handling: Extracts and resolves footnotes/endnotes using CitationText and resolve_citations()
Text Extraction: ElementExt trait provides trimmed_text() methods for clean text extraction with citation support
Validation: Each visitor defines EXPECTED_COUNT and validates parsed content against known totals

CSS Class Patterns Analysis

Common Structural Classes (Used Across Multiple Visitors)

"wf" - Footer/stop condition (Gleanings, Meditations)
"c q" - Roman numeral section headers (Gleanings, Meditations)
"hb" - Author/header elements (appears in Hidden Words, Prayers, CDB)
"zd" - Content sections (Hidden Words: "zd hb", "dd zd hb", "dd zd")
"ub" - Section headers (Prayers: "ub c l", CDB: "ub w kf")

Document-Specific Classes

Hidden Words:

"w" - Top invocation text
"zd hb" - Prologue/epilogue sections
"dd zd hb" - Prelude text (special prefaces)
"dd zd" - Main hidden word content

Gleanings & Meditations (Similar Structure):

"wf" - Footer (stop parsing)
"c q" - Roman numeral section headers (I, II, III, etc.)

Prayers (Most Complex Structure):

"hb.ac" / "hb ac" - Author attribution
"bf wf" - Endnotes (stop condition)
"e" - Title elements
"g c" - Prayer kind/category
"ub c l" - Section headers
"xc jb c kf z nb zd ub" - Subsection headers
"c kf z nb zd ub" - Teaching sections
"cb" / "z" - Instructional text

Call of Divine Beloved (Poetry Structure):

"ic .g" - Work titles
"ic .hb" / "ic .j" - Work subtitles
"a.td" - Paragraph numbers
"ub w kf" - Invocation text
"span.dd" - Poetry containers
"span.ce" - Poetry lines

Pattern Recognition for New Parsers

Section Headers: Look for "c q" (roman numerals) or "ub" variants
Content Paragraphs: Usually "p" elements with specific class combinations
Stop Conditions: Typically "wf" (footer) or "bf wf" (endnotes)
Author Attribution: "hb" variants, often combined with "ac"
Special Text: "zd" variants for prologues/epilogues, "w" for invocations
Poetry: Look for "span.dd" containers with "span.ce" lines

Systematic Visitor Development Procedure

Phase 1: HTML Analysis & Pattern Discovery

Download HTML: Fetch the target document from bahai.org and save to writings/html/
Extract CSS Classes: Use grep -o 'class="[^"]*"' file.html | sort | uniq to identify all classes
Map Document Structure: Identify key structural elements:
- Navigation/TOC: Usually nav.gc with ul structure
- Main Content: Look for div.dd containers or similar content wrappers
- Content Start: Beginning of actual content (excluding preface, introductions, etc.)
- Section Headers: Search for patterns like "c q", "ub c l", etc.
- Content Paragraphs: Find p elements with meaningful class combinations
- Stop Conditions: End of content (usually just before end notes/footers)
- Reference IDs: Extract a.sf elements with id attributes for paragraph references
- Numbering Systems: Document-specific numbering (roman numerals, paragraph numbers, etc.)
- Special Elements: Identify citations (sup with a.sf), poetry structures, etc.

Phase 2: Visitor Implementation

Create Struct: Define the Rust struct that will hold parsed data
Implement WritingsVisitor Trait:
- Set URL and EXPECTED_COUNT
- Implement visit() method with pattern matching
- Add state management fields (counters, current section, etc.)
CSS Class Constants: Define LazyLock<ClassList> constants for each pattern
State Machine Logic: Handle document flow:
- Start Conditions: When to begin parsing (after title, first section, etc.)
- Content Extraction: How to extract text and metadata
- Transition Logic: When to move between sections/works
- Stop Conditions: When to terminate parsing

Phase 3: Common Implementation Patterns

Reference ID Extraction: Always use self.get_ref_id(element) for a.sf elements
Text Extraction: Use element.trimmed_text(depth, strip_newlines) with appropriate depth
Citation Handling: For complex documents, implement citation extraction and resolution
Validation: Include EXPECTED_COUNT validation and test with known text samples
Error Handling: Use panic! for parsing errors during development, refine for production

Phase 4: Testing & Validation

Unit Tests: Create #[tokio::test] with test_visitor::<Visitor>(EXPECTED_TEXTS).await
Expected Texts: Include 5-10 representative text samples from the document
Count Validation: Ensure EXPECTED_COUNT matches actual parsed items
Integration: Add to workspace and test with just test

Key Insights from Existing Visitors

Gleanings/Meditations: Simple structure with roman numeral sections and paragraph counting
Hidden Words: Complex state management with prelude/invocation tracking and part transitions
Prayers: Most complex with nested sections, author detection, and citation resolution
CDB: Poetry-specific with line-by-line parsing and work title detection
Common Pattern: All visitors use VisitorAction to control traversal flow
Text Processing: trimmed_text() handles citation extraction and clean text normalization
State Management: Each visitor maintains parsing state specific to document structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Guidelines for writings Repository

Build/Lint/Test Commands

Code Style Guidelines

Visitors Technical Overview

CSS Class Patterns Analysis

Common Structural Classes (Used Across Multiple Visitors)

Document-Specific Classes

Pattern Recognition for New Parsers

Systematic Visitor Development Procedure

Phase 1: HTML Analysis & Pattern Discovery

Phase 2: Visitor Implementation

Phase 3: Common Implementation Patterns

Phase 4: Testing & Validation

Key Insights from Existing Visitors

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Guidelines for writings Repository

Build/Lint/Test Commands

Code Style Guidelines

Visitors Technical Overview

CSS Class Patterns Analysis

Common Structural Classes (Used Across Multiple Visitors)

Document-Specific Classes

Pattern Recognition for New Parsers

Systematic Visitor Development Procedure

Phase 1: HTML Analysis & Pattern Discovery

Phase 2: Visitor Implementation

Phase 3: Common Implementation Patterns

Phase 4: Testing & Validation

Key Insights from Existing Visitors