Skip to content

Commit 4f3bce2

Browse files
committed
Remove duplicate structures and functionality as well as removing accidentally included log files for single character and embedding size tests to clean up the repository.
1 parent 53b7483 commit 4f3bce2

File tree

76 files changed

+1178
-2464
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+1178
-2464
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ Cargo.lock
2020
*.bin
2121
meta.json
2222

23+
# Logs and temporary files
24+
/logs/
25+
src/logs/
26+
*.log
27+
*.log.*
28+
2329
# Environment variables
2430
.env
2531
.env.local

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
[workspace]
33
members = [
4-
"rust_ingest"
4+
"src"
55
]
66
resolver = "2"

README.md

Lines changed: 157 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,10 @@
77
88
## Core Vision
99

10-
HeraldStack is an am## Directory Structure Overview
11-
12-
For a detailed, canonical description of the project's directory structure, see:
13-
14-
- [docs/DETAILED.md](docs/DETAILED.md)**Directory Structure and Naming Best
15-
Practices** (includes a `tree` overview and rationale)
16-
- [docs/naming-conventions.md](docs/naming-conventions.md)**Directory and
17-
file naming conventions**
18-
- [docs/DEVELOPMENT-PRINCIPLES.md](docs/DEVELOPMENT-PRINCIPLES.md)
19-
**Development principles and migration history**
20-
21-
Historical migration documents have been moved to
22-
[docs/migration/archive/](docs/migration/archive/) for reference.gence system
23-
that integrates memory, emotion, and modular execution across a trusted cohort
24-
of AI entities to restore momentum, anchor decisions, and evolve alongside
25-
Bryan's ongoing personal and professional journey.
10+
HeraldStack is an ambient intelligence system that integrates memory, emotion,
11+
and modular execution across a trusted cohort of AI entities to restore
12+
momentum, anchor decisions, and evolve alongside Bryan's ongoing personal and
13+
professional journey.
2614

2715
## 🚨 Critical Development Principles
2816

@@ -33,7 +21,8 @@ for any application functionality. Instead:
3321

3422
- **Add features to existing Rust binaries**
3523
- **Update documentation** (README.md, .md files)
36-
- **Add --help flags** to existing tools
24+
- **Add comprehensive --help flags** to existing tools for self-documenting
25+
usage
3726

3827
#### Exceptions: Scripts That Remain as Shell Scripts
3928

@@ -64,9 +53,140 @@ Before manually fixing linting/formatting issues, run our automated tools:
6453
**See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for complete development
6554
guidelines.**
6655

56+
## Directory Structure Overview
57+
58+
HARALD follows a standard project structure designed for maintainability and
59+
clear separation of concerns:
60+
61+
### Recommended Project Layout
62+
63+
```text
64+
HARALD/
65+
├── src/ # Main source code directory
66+
│ ├── api/ # API endpoints and handlers
67+
│ ├── core/ # Core application logic
68+
│ │ ├── embedding/ # Embedding-related logic
69+
│ │ ├── entities/ # Entity management logic
70+
│ │ └── memory/ # Memory handling logic
71+
│ ├── ingest/ # Ingestion pipeline and tools
72+
│ └── utils/ # Shared utilities and helpers
73+
│ ├── json-tools/ # JSON formatting and validation
74+
│ ├── validation/ # Code validation utilities
75+
│ └── system/ # System utilities
76+
├── ai-entities/ # AI entity definitions and metadata
77+
├── config/ # Configuration files and schemas
78+
│ ├── schemas/ # JSON schemas
79+
│ ├── ethics/ # Ethics guidelines
80+
│ └── models/ # Model configurations
81+
├── data/ # Data files and registries
82+
│ ├── raw/ # Raw input data
83+
│ └── processed/ # Processed data and embeddings
84+
├── docs/ # Documentation
85+
│ ├── migration/ # Migration documentation and archive
86+
│ └── vector-search/ # Vector search documentation
87+
├── scripts/ # Infrastructure and deployment scripts
88+
│ ├── deploy/ # Deployment scripts
89+
│ └── validation/ # Validation scripts (shell-based)
90+
└── tests/ # Tests and fixtures
91+
├── unit/ # Unit tests
92+
├── integration/ # Integration tests
93+
└── fixtures/ # Test fixtures and sample data
94+
```
95+
96+
### Key Principles
97+
98+
- **`src/`** - Central location for all application code written in Rust
99+
- Sub-modules organized by functionality (api, core, ingest, utils)
100+
- All Rust binaries built from this directory tree
101+
- **`config/`** - Centralized configuration management
102+
- JSON schemas, model configs, ethics guidelines
103+
- **`data/`** - Raw and processed data with clear separation
104+
- Vector store registries and embedded data
105+
- **`scripts/`** - Infrastructure scripts only (deployment, CI/CD)
106+
- Application logic has been migrated to Rust in `src/`
107+
- **`tests/`** - Comprehensive testing structure
108+
- Unit, integration, and fixture organization
109+
110+
### Migration Status
111+
112+
This structure represents the target state. Current migration progress:
113+
114+
- ✅ Core Rust tools implemented in `src/utils/`
115+
- ✅ Ingestion pipeline moved to `src/ingest/`
116+
- ✅ Shell scripts migrated to Rust binaries
117+
- 🔄 Ongoing migration of remaining application logic to `src/`
118+
119+
For detailed documentation:
120+
121+
- [docs/DETAILED.md](docs/DETAILED.md) – Complete directory descriptions
122+
- [docs/naming-conventions.md](docs/naming-conventions.md) – Naming standards
123+
- [docs/DEVELOPMENT-PRINCIPLES.md](docs/DEVELOPMENT-PRINCIPLES.md) – Development
124+
principles and migration history
125+
126+
## Archive Policy
127+
128+
### Historical Documents and Legacy Code
129+
130+
HARALD maintains archived materials for historical reference and context. These
131+
archives are excluded from active development workflows:
132+
133+
#### Archive Locations
134+
135+
- **`docs/migration/archive/`** - Historical migration documentation
136+
- Shell script prevention strategies
137+
- Detailed migration plans and checklists
138+
- Step-by-step cleanup procedures
139+
- Legacy decision documentation
140+
141+
- **`scripts/*.legacy`** - Archived shell scripts (when present)
142+
- Backup copies of migrated scripts
143+
- Reference implementations for comparison
144+
- Historical functionality documentation
145+
146+
- **Early experiments and prototypes** (project-specific locations)
147+
- Proof-of-concept implementations
148+
- Alternative approaches that were not adopted
149+
- Research and exploration code
150+
151+
#### Archive Characteristics
152+
153+
- **Ignored by automation** - Archive directories are excluded from:
154+
- Linting and formatting tools
155+
- Build processes and validation
156+
- Automated testing suites
157+
- Code quality checks
158+
159+
- **Historical reference only** - Archived materials:
160+
- Preserve context for past decisions
161+
- Document migration rationale and process
162+
- Provide examples of previous approaches
163+
- Should not be modified or actively maintained
164+
165+
- **Documentation over deletion** - We prefer archiving to deletion because:
166+
- Historical context aids future decision-making
167+
- Migration patterns can be reused
168+
- Past approaches inform current best practices
169+
- Preserves institutional knowledge
170+
171+
#### When to Archive
172+
173+
Archive materials when:
174+
175+
1. **Migrating functionality** from shell scripts to Rust implementations
176+
2. **Consolidating documentation** to eliminate duplication
177+
3. **Refactoring approaches** that replace previous patterns
178+
4. **Completing experiments** that informed current architecture
179+
180+
#### Accessing Archives
181+
182+
- Use archived materials for **historical context only**
183+
- Reference archives when **documenting decisions**
184+
- Consult archives to **understand migration patterns**
185+
- **Do not** use archived code in active development
186+
67187
## Key Components
68188

69-
- **🦊 HARALD** – Default entity for emotional mirroring, decision anchoring,
189+
- **🦊 HARALD** – Default entity for emotional mirroring, decision anchoring,
70190
and continuity management
71191
- **🧠 Herald Entity Cohort** – Specialized assistants with distinct
72192
personalities and roles
@@ -131,7 +251,16 @@ cd src && cargo build --release --features cli
131251
### Using Rust Binaries
132252

133253
All binaries are located in `src/target/release/` and should be run from the
134-
project root:
254+
project root. **Each tool includes comprehensive `--help` documentation**:
255+
256+
```bash
257+
# Get detailed usage for any tool
258+
./src/target/release/format_json --help
259+
./src/target/release/validate_naming --help
260+
./src/target/release/text_chunker --help
261+
```
262+
263+
#### Common Usage Examples
135264

136265
```bash
137266
# Format and validate JSON files
@@ -146,13 +275,15 @@ project root:
146275
# Check system status (Ollama services, models, etc.)
147276
./src/target/release/status
148277

149-
# Process text for embedding
150-
./src/target/release/text_chunker --input file.txt --mode char --size 250
151-
152-
# Run any tool with --help to see available options
153-
./src/target/release/format_json --help
278+
# Process text for embedding with detailed options
279+
./src/target/release/text_chunker --char 250 --file input.txt --json
154280
```
155281

282+
**Self-Documenting Design**: Instead of maintaining separate documentation, each
283+
binary provides complete usage instructions via `--help`. This ensures usage
284+
information stays current with the code and reduces documentation maintenance
285+
overhead.
286+
156287
**Note**: These Rust binaries have replaced the previous shell scripts for
157288
application logic. The old shell scripts in `scripts/validation/` have been
158289
migrated to these type-safe, performant Rust implementations.
@@ -191,12 +322,10 @@ handling.
191322
directories
192323
- **Build & Deploy**: Use `./scripts/deploy/deploy.sh` for deployment (see
193324
[DEPLOY.md](scripts/deploy/DEPLOY.md) for usage)
194-
- **JSON Tools**: Rust-based JSON processing utilities in `src/utils/json_tools`
195-
(see [JSON-TOOLS.md](src/utils/json_tools/JSON-TOOLS.md))
325+
- **JSON Tools**: Rust-based JSON processing utilities in `src/utils/json-tools`
326+
(see [JSON-TOOLS.md](src/utils/json-tools/JSON-TOOLS.md))
196327
- **Shell vs Rust**: Infrastructure scripts use shell, application logic uses
197328
Rust
198-
- [Project Structure](docs/migration/RECOMMENDED-STRUCTURE.md) - Recommended
199-
organization
200329
- **Ingestion/Embedding Architecture:** All ingestion and embedding logic must
201330
follow the
202331
[Modular Ingest Refactor Plan](docs/migration/INGEST-MIGRATION-MODULAR-PLAN.md).
@@ -240,20 +369,6 @@ guidelines including those defined in
240369
- **Test Data**: Test fixtures are available in `tests/fixtures/` (see
241370
[FIXTURES.md](tests/fixtures/FIXTURES.md) for details)
242371

243-
## Directory Structure Overview
244-
245-
For a detailed, canonical description of the project’s directory structure, see:
246-
247-
- [docs/DETAILED.md](docs/DETAILED.md)**Directory Structure and Naming Best
248-
Practices** (includes a `tree` overview and rationale)
249-
- [docs/naming-conventions.md](docs/naming-conventions.md)**Directory and
250-
file naming conventions**
251-
252-
Other structure-related documents in `docs/migration/` (such as
253-
`RECOMMENDED-STRUCTURE.md`, `DIRECTORY-REORGANIZATION.md`, and
254-
`IMPLEMENTATION-PLAN.md`) are project planning artifacts and will be moved to a
255-
`docs/project-planning/` subdirectory.
256-
257372
---
258373

259374
Shared under MIT Open License 2025 Bryan Chasko

ai-entities/entity-registry.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,4 @@
4141
"triggers": ["technical", "systems", "problems"]
4242
}
4343
]
44-
}
44+
}

config/CONFIG.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,46 @@
11
# HARALD Configuration
22

3-
This directory contains configuration files for the HARALD project.
3+
This directory contains all configuration files for the HARALD project,
4+
centralized per our development standards.
45

5-
## Files
6+
## Structure
67

7-
- `default.json` - Default configuration settings
8+
- `.markdownlint.json` - Markdown linting configuration
9+
- `vector-stores-registry.json` - Vector store configuration registry
810
- `ethics/` - Ethical guidelines including Laws of Robotics
9-
- `models/` - Model configuration files like Modelfile
10-
- `schemas/` - JSON schemas for validation
11+
- `models/` - Model configuration files (Ollama Modelfile, etc.)
12+
- `schemas/` - JSON schemas for validation and data structure definitions
13+
14+
## Configuration Files
15+
16+
### Core Configuration
17+
18+
- `vector-stores-registry.json` - Registry of all vector store configurations
19+
- `.markdownlint.json` - Markdown linting rules and formatting standards
20+
21+
### Ethics Configuration
22+
23+
- `ethics/LawsOfRobotics.json` - Asimov's Laws implementation for AI safety
24+
25+
### Model Configuration
26+
27+
- `models/Modelfile` - Ollama model configuration and parameters
28+
29+
### Schema Definitions
30+
31+
- `schemas/conversation-metadata.json` - Conversation data structure schema
32+
- `schemas/emotion-vectors.json` - Emotion representation schema
33+
- `schemas/entity-context.json` - Entity context data schema
34+
- `schemas/narrative-arc.json` - Narrative structure schema
1135

1236
## Best Practices
1337

14-
1. Keep sensitive information in environment variables, not in config files
15-
2. Use kebab-case for configuration file names
16-
3. Document all configuration options
17-
4. Provide sensible defaults for all settings
18-
5. Validate configuration against schemas on application startup
38+
1. **Centralization**: All configuration files belong in this directory
39+
2. **Security**: Keep sensitive information in environment variables, not config
40+
files
41+
3. **Naming**: Use kebab-case for configuration file names
42+
4. **Documentation**: Document all configuration options and their purposes
43+
5. **Defaults**: Provide sensible defaults for all settings
44+
6. **Validation**: Validate configuration against schemas on application startup
45+
7. **Version Control**: Configuration files should be version controlled for
46+
consistency

0 commit comments

Comments
 (0)