NeuroFed (NeuroFed) Node is a decentralized federated AGI system based on pure hierarchical predictive coding. It implements a biologically plausible, fully decentralized, offline-first federated AGI system using Rust, candle framework, and Nostr protocol.
- Build a personal AI that can be used by other users and is not tied to any organization.
- Self-study via proxy use, local/remote books, YouTube subtitles, internet research, and fact checking.
- Strong coding capability with durable code knowledge; also strong text understanding and reasoning.
- Final package size under 500 MB.
- Use modern algorithms; support CPU/GPU/TPU when available; prioritize CPU cache locality.
- Avoid long-term degradation (stability of learned knowledge over time).
- TODO: Share database with other users (e.g., via Nostr). Support federated redundancy and multi-user desktop deployment to share compute resources and resist corporate centralization.
- Viral and eye-candy experience (memorable UI/UX and shareable story).
- License: GPL-3.0-or-later.
- Support fast and slow thinking modes.
- Primary target: x86_64_v3, 32 GB RAM.
- The authoritative architecture is the code under
src/, not the aspirational module list below.
neuro-pc-node/
├── src/
│ ├── main.rs # Minimal executable path and smoke-test style startup
│ ├── lib.rs # Public module graph and compatibility re-exports
│ ├── config.rs # Primary runtime configuration types
│ ├── types.rs # Legacy/common DTO-style types; not the single source of truth
│ ├── persistence.rs # SQLite persistence for PC weights, peers, cache
│ ├── node_loop.rs # Event loop skeleton
│ ├── ml_engine.rs # GGUF/tokenizer loading and text embedding pipeline
│ ├── model_manager.rs # Model selection and download logic
│ ├── pc_hierarchy.rs # Predictive coding orchestration
│ ├── pc_level.rs # Per-level PC update logic
│ ├── pc_types.rs # Canonical PC config/error/stat types
│ ├── pc_decoder.rs # Belief decoding logic
│ ├── bootstrap.rs # Synthetic/bootstrap training utilities
│ ├── brain_manager.rs # Brain sharing workflow
│ ├── semantic_cache.rs # Semantic cache implementation
│ └── pow_verifier.rs # PoW verification support
├── docs/
│ ├── architecture.md # System architecture documentation
│ ├── equations.md # Mathematical foundations
│ ├── api.md # Public API documentation
│ └── installation.md # Installation and setup guide
├── examples/
│ ├── basic_usage.rs
│ ├── federated_demo.rs
│ └── performance_bench.rs
├── ui/ # Web UI assets (HTML/JS/CSS)
├── resources/
│ ├── default_config.toml
│ ├── models/ # GGUF model storage
│ └── schemas/ # Database schemas
├── scripts/
│ ├── build_release.sh
│ ├── cross_compile.sh
│ └── test_all.sh
├── .github/
│ ├── workflows/
│ │ ├── ci.yml
│ │ ├── release.yml
│ │ └── nightly.yml
│ └── ISSUE_TEMPLATE/
├── Cargo.toml # Rust package configuration
├── README.md
├── LICENSE
└── .gitignore
- Rust (stable 2026 or nightly)
- Git
# Clone the repository
git clone https://github.com/neuro-pc/neuro-pc-node.git
cd neuro-pc-node
# Install Rust if not already installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install dependencies
cargo install cargo-watch cargo-audit# Set up development environment
export RUST_LOG=debug
export RUST_BACKTRACE=1
# For GPU development (if available)
export GPU_BACKEND=cuda # or metal, vulkan, cpu
export GPU_DEVICE_ID=0
export GPU_MEMORY_FRACTION=0.8- src/main.rs: Application entry point and main loop
- src/lib.rs: Public API exports
- src/config.rs: Configuration management
- src/types.rs: Legacy/common data types; avoid adding new canonical config here
- src/persistence.rs: SQLite database and state persistence
- src/node_loop.rs: Async processing loop skeleton
- src/ml_engine.rs: ML Engine using candle framework for pure Rust CPU/GPU operations
- src/pc_hierarchy.rs: Pure Predictive Coding implementation
For every code update, enforce the following before considering the step complete:
- Parse learning log: run
cargo run --bin learning_benchmark -- --skip-runand review the updatedlearning_feedback.csv. - Investigate anomalies: if losses, trajectories, or counts look worse than the previous step, inspect
detail.logand explain the regression. - Apply fixes: do not proceed to the next change until learning/generation is at least stable or improved.
- Smoke coverage: keep learning/generation unbroken by running lightweight tests when feasible (e.g.,
cargo test --libandcargo test --test integration_tests).
Proceed through the current plan step-by-step without asking for confirmation:
- Stabilize flaky learning-related tests (deterministic inputs and tolerances).
- Expand learning log parsing to include replay/sleep-phase entries.
- Add a JSONL reasoning dataset loader for replay. After each step, run the Learning/Generation Quality Gate and only advance if metrics are stable or improved.
For every future plan, the final step must always be:
- create the next best plan from the new project state and continue on it
Operationally, each completed step should follow this loop:
- implement the current best step(s)
- run the Learning/Generation Quality Gate
- create a non-amended commit for the validated step
- create the next best plan from the new state and continue without stopping
Use this as the default long-horizon execution order for all future plans. Do not try to finish all of it in one step; always pick the highest-leverage next slice that can end in a working validated state.
- Keep the current build, tests, and learning gate green at every step.
- Preserve deterministic reasoning tasks as a non-regressing baseline.
- Keep investigation, code, and text outputs structurally sectioned.
- Score structured assistant sections in replay and benchmark logs.
- Add heuristic evaluators for investigation findings/evidence quality.
- Add heuristic evaluators for code implementation/verification/risk quality.
- Add heuristic evaluators for text rewrite/quality-check fidelity.
- Reuse prior investigation notes through persistent retrieval.
- Reuse prior code workflow notes through persistent retrieval.
- Reuse prior text workflow notes through persistent retrieval.
- Strengthen code-task workflow contracts with touched-area summaries.
- Strengthen code-task workflow contracts with explicit verification commands.
- Strengthen code-task workflow contracts with residual-risk summaries.
- Strengthen text-task workflow contracts with tone/length constraints.
- Strengthen text-task workflow contracts with fidelity checks.
- Strengthen investigation workflows with evidence/open-question persistence.
- Convert successful assistant episodes into richer replay rows.
- Add benchmark cases for structured investigation outputs.
- Add benchmark cases for structured code-task outputs.
- Add benchmark cases for structured text-task outputs.
- Fail the gate when structured outputs collapse into generic chat.
- Fail the gate when code verification disappears from code-task outputs.
- Fail the gate when evidence/open-questions disappear from investigations.
- Fail the gate when text quality checks disappear from rewrites.
- Add planner state that can mark substeps pending/in-progress/completed.
- Split planner state from execution state.
- Allow plan revision when new evidence invalidates assumptions.
- Make the assistant summarize why a plan changed.
- Add a bounded software-development executor loop.
- Let the assistant inspect code paths before proposing edits.
- Let the assistant choose the narrowest relevant verification command.
- Let the assistant store verified coding patterns for reuse.
- Add a bounded text-editing executor loop.
- Let the assistant store reusable text transformation patterns.
- Add a bounded investigation executor loop.
- Let the assistant carry unresolved questions across sessions.
- Add user-preference memory for writing/help style.
- Add reminder/task continuity for personal-assistant behavior.
- Add durable project-specific architecture notes for coding assistance.
- Expand dataset normalization to include structured assistant outputs.
- Expand replay scoring to use section-level evaluator signals.
- Expand sleep consolidation to learn from structured workflow episodes.
- Add anti-cheat metrics for missing or trivial workflow structure.
- Add evaluator summaries back into the stored workflow memories.
- Add queueable background tasks for deferred study and follow-up.
- Start integrating real executor behavior into
node_loop.rs. - Reduce runtime/documentation drift after each integrated subsystem step.
- Reduce type drift between config/types/pc_types while preserving behavior.
- Prefer changes that compound verification, memory, and replay quality together.
- End every completed plan by creating the next best plan from the new state and continuing automatically.
Selection rule:
- If several possible next steps exist, prefer the smallest step that improves at least one of:
- output structure
- evaluation quality
- retrieval memory
- replay/benchmark coverage
- autonomous software-development usefulness
To exercise reasoning → state → output paths via learning_benchmark --reasoning-replay, provide JSONL with fields:
task: one ofmultiply,reverse_string,sum_even,max,sort_list- Task fields:
multiply:a,breverse_string:inputsum_even/max/sort_list:values(array of integers)
- Optional:
ops: array of ThoughtOps (e.g.,PLAN,DECOMPOSE,INITIALIZE_VARIABLE,COMPUTE_MATH,RETURN_VALUE,EOF)expected_output: string used for text-loss check- query aliases:
raw_query,query,problem, orinstruction expectedis also accepted as an alias forexpected_output, including numeric values for solver tasks
If ops is omitted, the benchmark now fills in the canonical recommended ThoughtOp chain for the task automatically.
If no query field is provided, it synthesizes a stable default query string from the task payload so replay logs stay readable and comparable.
Example line:
{"task":"multiply","a":17,"b":23,"ops":["PLAN","DECOMPOSE","INITIALIZE_VARIABLE","COMPUTE_MATH","REFINE","RETURN_VALUE","EOF"],"expected_output":"391"}Use scripts/generate_learning_dataset.py to normalize multi-type JSONL into a single structured stream:
python scripts/generate_learning_dataset.py --input assistant.jsonl,reasoning.jsonl,code.jsonl,agent.jsonl --output merged_learning.jsonlExpected input record types (type field):
assistant:user,assistantreasoning:problem,thoughts(array),solutioncode:instruction,code,tests, optionalfinalagent:goal,tool_call,observation,next_action
Additional reasoning task types for replay JSONL:
sympy_eval:expression,operation, optionalexpectedz3_solve:var,constraints(array of strings), optionalexpected
Use scripts/query_learning_dataset.py to inspect and filter the merged JSONL:
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --stats
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --type reasoning --max-chars 3000 --output data/reasoning_filtered.jsonl
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --contains "toxicity|abuse" --output data/cleaned.jsonl
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --preset alpaca --output data/alpaca_filtered.jsonl
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --preset openassistant --min-score 0.6 --output data/oa_filtered.jsonl
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --preset reasoning --output data/reasoning_preset.jsonl --stats
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --max-chars 2500 --output data/merged_filtered.jsonl --statsFor reasoning training, the merged dataset is not sufficient by itself. Use the dedicated preparation step:
.venv/bin/python scripts/prepare_reasoning_dataset.py --input data/merged_learning.jsonl --output data/reasoning_ready.jsonl --validate --require-validation-okThis keeps reasoning/code/agent rows and only retains assistant rows that look reasoning-relevant, while attaching reasoning_score metadata and synthesizing minimal thought traces when safe.
If you want an OpenAI-compatible remote LLM to preprocess rows further, use:
.venv/bin/python scripts/llm_prepare_reasoning_dataset.py \
--input data/merged_learning.jsonl \
--output data/reasoning_ready_llm.jsonl \
--base-url http://YOUR_OPENAI_COMPATIBLE_HOST/v1 \
--api-key YOUR_KEY \
--model YOUR_MODEL \
--validate --require-validation-okRecommended flow:
- run
prepare_reasoning_dataset.pyfirst for cheap local filtering - then run
llm_prepare_reasoning_dataset.pyon the merged or filtered dataset - keep
--min-heuristic-scoreabove 2 so the remote LLM only sees likely reasoning rows
If you want the local non-LLM path only:
.venv/bin/python scripts/llm_prepare_reasoning_dataset.py \
--input data/merged_learning.jsonl \
--output data/reasoning_ready_dryrun.jsonl \
--dry-run --validateValidate dataset quality before training:
.venv/bin/python scripts/validate_reasoning_dataset.py --input data/merged_learning.jsonl --apply-prepare
.venv/bin/python scripts/validate_reasoning_dataset.py --input data/reasoning_ready.jsonl
.venv/bin/python scripts/validate_reasoning_dataset.py --input data/reasoning_ready_llm.jsonlUse this as the minimum gate:
- raw merged dataset report
- heuristic-prepared reasoning dataset report
- optional LLM-prepared reasoning dataset report Do not start a reasoning-focused training run if generic assistant contamination is still high or reasoning-ready coverage is weak.
Use scripts/fetch_datasets.py to download and convert the required datasets into raw JSONL:
.venv/bin/python scripts/fetch_datasets.py --datasets alpaca,dolly,openassistant,gsm8k,strategyqa,hotpotqa,codesearchnet,humaneval --limit 5000 --streamingNotes:
- Requires a local venv with
datasets+huggingface_hub:python3 -m venv .venv .venv/bin/pip install datasets huggingface_hub
- Set
HF_TOKENin.envfor higher rate limits. - For CodeSearchNet or The Stack, pass
--language python(orrust,go,java,javascript). - For The Stack subset, add
the_stackto--datasetsand set a small--limit. - Agent datasets:
- ToolBench: add
toolbench - WebArena: add
webarena
- ToolBench: add
.venv/bin/python scripts/fetch_datasets.py --datasets alpaca,dolly,openassistant,gsm8k,strategyqa,hotpotqa,codesearchnet,humaneval --limit 5000 --streaming
.venv/bin/python scripts/fetch_datasets.py --datasets toolbench,webarena --limit 200 --streaming
.venv/bin/python scripts/generate_learning_dataset.py --input data/raw/alpaca.jsonl,data/raw/dolly.jsonl,data/raw/openassistant.jsonl,data/raw/gsm8k.jsonl,data/raw/strategyqa.jsonl,data/raw/hotpotqa.jsonl,data/raw/codesearchnet.jsonl,data/raw/humaneval.jsonl,data/raw/toolbench.jsonl,data/raw/webarena.jsonl --output data/merged_learning.jsonl
.venv/bin/python scripts/prepare_reasoning_dataset.py --input data/merged_learning.jsonl --output data/reasoning_ready.jsonl
.venv/bin/python scripts/query_learning_dataset.py --input data/merged_learning.jsonl --max-chars 2500 --output data/merged_filtered.jsonl --stats
.venv/bin/python scripts/augment_reasoning_dataset.py --input data/merged_filtered.jsonl --output data/merged_augmented.jsonlTo automatically add simple reasoning traces to assistant rows:
.venv/bin/python scripts/augment_reasoning_dataset.py --input data/merged_learning.jsonl --output data/merged_learning_augmented.jsonl
.venv/bin/python scripts/augment_reasoning_dataset.py --input data/merged_filtered.jsonl --output data/merged_augmented.jsonlThis only augments simple arithmetic (a + b, a - b, a * b) when thought is missing.
- Z3 integration lives in
src/reasoning_tools.rsand is gated by feature flagz3-tools. - SymPy checks use a Python subprocess (
python3 -c ...); setPYTHONenv var to override. - src/model_manager.rs: Model detection, recommendation, and downloading
- src/bootstrap.rs: Bootstrap and synthetic training utilities
- src/brain_manager.rs: Brain sharing and import/export workflow
Enable the minimal, stable predictive-coding loop (simple inference + learning rule):
# config.toml
[pc_config]
minimal_pc_mode = trueUse the small reasoning dataset:
study/minimal_pc/data/minimal_pc_sum.jsonlRun a quick learning benchmark on the minimal set:
rm -f neurofed.db detail.log && \
cargo run --bin learning_benchmark -- \
--study-paths study/minimal_pc/data/minimal_pc_sum.jsonlOr use the helper script:
scripts/run_minimal_pc.shOptional smoke test (disabled by default in CI/sandbox):
RUN_MINIMAL_PC_SCRIPT_SMOKE=1 cargo test --test minimal_pc_script_smoke- Type drift across modules:
config.rs,types.rs, andpc_types.rsdefine overlapping concepts. New work should consolidate around one canonical type per concept instead of adding more adapters. - Runtime/documentation drift: the binary currently exercises only a narrow startup path. Do not document subsystems as production-ready unless they are actually invoked from
main.rsor an equivalent entrypoint. - Single-process lock contention:
Arc<Mutex<...>>around core ML and PC state is acceptable for the prototype, but it will serialize work and limit throughput once proxy and federation paths become active. - Blocking/external side effects during model init: tokenizer/model fallback logic may trigger filesystem or network-dependent behavior at construction time. Keep initialization deterministic where possible.
- Stubbed orchestration:
node_loop.rscurrently proves lifecycle shape, not business behavior. Avoid building new assumptions on top of its placeholder handlers without implementing them first.
- Integrate
node_loop.rsinto the actual runtime and replace placeholder handlers with real user/file/Nostr processing. - Wire
brain_manager.rsinto the executable path and document the operational workflow only after end-to-end integration exists. - Reduce compatibility/placeholder reliance in
types.rsby moving callers to canonical types inconfig.rsandpc_types.rs. - Promote currently standalone infrastructure such as persistence and model management into tested end-to-end flows.
# Build in debug mode
cargo build
# Build in release mode
cargo build --release
# Build with web UI (Phase 3)
cargo build --features web-ui
# Build for specific target
cargo build --target x86_64-unknown-linux-gnu# Run in debug mode
cargo run
# Run with specific configuration
cargo run -- --config config.toml
# Run with web UI
cargo run --features web-ui
# Run tests
cargo test
# Run with specific test
cargo test ml_engine::tests::test_embedding_creation# Ensure config.toml has: web_ui_enabled=true, bootstrap_on_start=true,
# require_thought_ops=true, min_thought_ops=2, inference_steps=8
cargo run --features web-ui --bin neuro-fed-node -- --config config.tomlThen open:
http://localhost:8080/uiThe current UI should expose:- mode chips for
Chat,Investigate,Code, andWrite - a structured answer panel on the right
- visible assistant intent and memory-hit counters
- live steps and telemetry
- reusable quick prompts for the active mode
- remembered mode, per-mode drafts, and ThoughtOps toggle state across refreshes
Ctrl+1/Ctrl+2/Ctrl+3/Ctrl+4mode switching and/prompt focusReuse in Prompt,Copy Answer, andReset Workspacecontrols for session ergonomics Ask a question and verify the response includes ThoughtOps and a coherent answer. Use Ask Once for a single-shot query without storing chat history in local storage.
Seeded demo content (user stories):
study/user_stories_seed.txtstudy/user_stories_thoughtops.jsonl
Tune these for better “thinking” in full mode:
pc_config.inference_steps(e.g., 8–16). Higher = more iterative reasoning.proxy_config.require_thought_ops = trueandmin_thought_ops = 2.- If you see DB lock errors:
rm -f neurofed.db detail.logbefore a fresh run.
# Format code
cargo fmt
# Check code style
cargo clippy
# Run security audit
cargo audit
# Check for outdated dependencies
cargo outdated
# Generate documentation
cargo doc --openExecute the following command to troubleshoot learning: rm -f neurofed.db detail.log && cargo build && timeout 180 target/debug/neuro-fed-node 2>&1 | tee output.log ; cat detail.log
Use the dedicated learning benchmark binary and helper script to rerun specific HumanEval/GSM8K slices and collect plan vs canonical comparisons. Example workflow:
# Rerun only the targeted dataset, then export enriched CSV data
rm -f neurofed.db detail.log && \
cargo build && \
cargo run --bin learning_benchmark -- --study-paths study/human-eval/data/HumanEval.jsonl --output learning_feedback.csv --skip-run=false && \
python scripts/collect_learning_feedback.py --log detail.log --output learning_feedback.csvAdjust --study-paths (comma-separated) to focus on other subsets; guided replay will automatically trigger for HumanEval/48 and /72 when loss exceeds 150.
# Run all tests
cargo test
# Run unit tests only
cargo test --lib
# Run integration tests only
cargo test --test integration
# Run tests with specific features
cargo test --features web-ui
# Run tests in release mode
cargo test --release# Run clippy with all features
cargo clippy --all-features -- -D warnings
# Run clippy with pedantic
cargo clippy -- -D clippy::pedantic
# Check for unsafe code
cargo grep unsafe
# Check for missing documentation
cargo doc --no-deps- Follow Rust standard conventions
- Use
rustfmtfor formatting - Use
clippyfor linting - Add documentation comments (
///) for public APIs - Add inline comments (
//) for complex logic - Keep files below 300 lines, to easy maintain them
- Prefer avoid lader effect of "if": do checks at the begging of function with early termination
- Follow SRP principle
- Use
thiserrorfor custom error types - Implement
DisplayandErrortraits - Use
Result<T, E>for fallible operations - Handle errors gracefully with meaningful messages
- Use
candle-corewith hardware acceleration for tensor operations - Implement proper memory management
- Use async/await for I/O operations
- Profile with
cargo flamegraph
- Validate all input data
- Use secure random number generation
- Implement proper error handling
- Follow Rust security best practices
- Create new module in
src/directory - Add module declaration to
src/lib.rs - Implement core functionality
- Add comprehensive tests
- Update documentation
- Add integration tests
- Use TDD
- Make sure high test coverage. Add coverage collection into test process
- Check whether a concept already exists in
config.rs,types.rs, andpc_types.rs - Pick one canonical location for the concept and route callers there
- Verify the runtime entrypoint actually composes the new component
- Prefer explicit integration tests over adding more placeholder modules
- Add to
[dependencies]in Cargo.toml - Run
cargo update - Test compatibility
- Update documentation
- Use
#[cfg(feature = "feature_name")]for conditional compilation - Define features in
[features]section of Cargo.toml - Use
--featuresflag when building
main: Production-ready codedevelop: Integration branch for new featuresfeature/*: Feature branchesbugfix/*: Bug fix brancheshotfix/*: Critical bug fixes
feat(component): add new functionality
fix(component): resolve issue with description
docs(component): update documentation
refactor(component): improve code structure
perf(component): optimize performance
ci: update CI configuration
- Create from feature branch to develop
- Include comprehensive description
- Add tests for new functionality
- Ensure all checks pass
- Request review from team members
- ci.yml: Run tests and linting on all pushes
- release.yml: Build and publish releases
- nightly.yml: Test with nightly Rust
- Rustfmt check
- Clippy linting
- Security audit
- Unit and integration tests
- Documentation generation
# Build release binaries
cargo build --release
# Cross-compile for different platforms
cargo build --release --target x86_64-unknown-linux-gnu
cargo build --release --target aarch64-apple-darwin
cargo build --release --target x86_64-pc-windows-msvc
# Create installer packages
# (Scripts in scripts/ directory)- Update version in Cargo.toml
- Update changelog
- Tag release in git
- Build release binaries
- Create release on GitHub
- Publish to package registries if applicable
# Generate API docs
cargo doc --open
# Generate docs with private items
cargo doc --document-private-items --open- Update README.md with installation and usage instructions
- Update docs/ directory with comprehensive guides
- Include examples in examples/ directory
- Update architecture.md with system design
- Update equations.md with mathematical foundations
- Update api.md with public API documentation
# Generate flamegraph
cargo flamegraph
# Check memory usage
cargo memcheck
# Benchmark performance
cargo bench- Use
metricscrate for performance metrics - Export to Prometheus for monitoring
- Include in web UI dashboard
- Use
cargo auditregularly - Follow Rust security best practices
- Validate all input data
- Use secure random number generation
- Encrypt sensitive data at rest
- Use secure communication protocols
- Implement proper access controls
- Regular security audits
- Fork the repository
- Create feature branch
- Make changes with tests
- Ensure all checks pass
- Submit pull request
- Address review comments
- Use GitHub issues for bug reports
- Include detailed reproduction steps
- Add relevant logs and error messages
- Specify affected components
- Use GitHub issues for feature requests
- Include use cases and requirements
- Discuss with community
- Prioritize based on impact
This guide provides a comprehensive overview of the development process for NeuroFed Node, ensuring consistent, high-quality code and efficient collaboration.
- Run
cargo checkto ensure no compilation errors - Run
cargo buildto verify the project builds correctly - Run
cargo testto ensure all tests pass - Run
cargo clippyto check for code style issues - Run
cargo fmtto format the code - Run
cargo auditto check for security vulnerabilities
- Create a feature branch from
develop - Make changes with comprehensive tests
- Ensure all CI checks pass locally
- Submit pull request to
develop - Address review comments
- Merge to
developand eventually tomain
- Unit tests for individual components
- Integration tests for component interactions
- End-to-end tests for complete workflows
- Performance tests for critical paths
- Security tests for vulnerabilities
- Check for functionality correctness
- Verify code follows style guidelines
- Ensure comprehensive test coverage
- Review security implications
- Check for performance issues
- Verify documentation is up-to-date
High-level but engineer-actionable plan to make Predictive Coding the primary reasoning source (ThoughtOps become mandatory, not optional).
- Gate text output until at least one ThoughtOp has been emitted.
- Add two decoder modes:
MODE_REASONINGthenMODE_OUTPUT. - Test:
17 * 23must emitCOMPUTE_MATHbefore final answer.
- Introduce a State Engine that applies ThoughtOps to a real mutable state.
- Update loss: include
state_errorin addition to text error. - Test: variable init/update yields correct state after the sequence.
- Train on tasks where direct output is impossible without ThoughtOps.
- Include arithmetic, symbolic transforms, and mini-programs.
- Dataset format must include
INPUT,THOUGHT,STATE,OUTPUT.
- Separate reasoning tokens from text tokens (distinct channels).
- Enforce: ThoughtOps are not just decoded text.
- Total loss =
state_error + reasoning_error + text_error. - Penalize correct text with incorrect ThoughtOps/state.
- Allow variable-length ThoughtOp chains (N steps).
- Test with max/argmax-style tasks.
- Add
PLAN / DECOMPOSE / REFINEoperations before execution ops.
- Track
reasoning_usage_rate,state_accuracy,steps_per_task. - Penalize trivial or missing chains.
- Add external tool ops:
SYMPY_EVAL,Z3_SOLVE. - Loop:
THINK → ACT → OBSERVE → UPDATE. - Always compare tool result vs predicted state and backprop error.
- ThoughtOp gating test (no output before reasoning).
- State Engine update test.
- Multi-step reasoning test.
- Tool-integrated validation test (SymPy/Z3).