Skip to content

Scaffold AI Frequently Asked Questions (FAQ)

Kevin Mastascusa edited this page Jun 30, 2025 · 2 revisions

❓ Scaffold AI Frequently Asked Questions (FAQ)

Last updated: 2025-06-30


General

What is Scaffold AI?

Scaffold AI is a curriculum recommendation tool designed to help educators integrate sustainability and climate resilience topics into academic programs using state-of-the-art AI models, including retrieval-augmented generation (RAG), semantic search, and LLMs.

Who should use this project?

Educators, curriculum designers, and researchers seeking literature-backed, transparent curriculum recommendations for sustainability and engineering education.


Installation & Setup

What are the system requirements?

  • Python 3.11+ (3.11 recommended)
  • 16GB+ RAM
  • NVIDIA GPU (recommended but not required)
  • Windows, Linux, or macOS

How do I set up the project?

  1. Clone the repo:
    git clone https://github.com/kevinmastascusa/scaffold_ai.git
  2. Create & activate a virtual environment.
  3. Install dependencies:
    pip install -r requirements.txt
  4. Run the setup script:
    python setup.py
  5. Place your PDF files in the data/ directory.

See Local Setup Guide for detailed steps.

Do I need a Hugging Face token?

Yes, for most LLM models.


Data Processing

How are PDFs processed?

  • PDFs are split into page-based chunks (one chunk per complete page).
  • Text is cleaned, Unicode-normalized, and analyzed for technical terms.
  • Math-aware and Unicode-aware chunking is available for advanced use.

What is the "combined words" issue?

PDF extraction can merge words (e.g., environmentalsustainability).


Vectorization & Search

How are documents searched?

  • Each chunk is embedded using all-MiniLM-L6-v2 (sentence-transformers).
  • Chunks are indexed using FAISS for efficient vector search.
  • Queries are embedded and matched to relevant chunks.

How is the final answer generated?

  • Top matches are reranked with a cross-encoder.
  • The LLM generates a grounded answer using only retrieved content.
  • Citations to source documents are included (citation layer in progress).

LLM & Model Usage

Which LLM is used?

  • Default: mistralai/Mistral-7B-Instruct-v0.2 (Hugging Face)
  • Alternatives: OpenHermes, TinyLlama, others (see model_summary.md)

Can I change the model?

Yes!

  • Edit LLM_MODEL in scaffold_core/config.py.
  • Make sure the model supports text-generation and you have access.

Testing & Validation

Is there a test suite?

Yes.

  • Run python scaffold_core/scripts/run_tests.py for comprehensive tests.
  • Generate a detailed report with python scaffold_core/scripts/generate_test_report.py.
  • See documentation/query_system_test_report.md.

Troubleshooting

I'm getting out of memory (OOM) errors. What should I do?

  • Use a smaller model (e.g., TinyLlama)
  • Reduce batch size or max_length in config.py
  • Run on CPU if GPU memory is insufficient

My model download fails or access is denied.

  • Check your Hugging Face token and model access permissions
  • Request access to gated models as described in the migration guide.
  • Make sure you have internet connectivity

The UI isn't available yet. When will it launch?

A pilot UI for querying and feedback is planned for the early testing phase.


Contributing & Support

How can I contribute?

  • Fork the repo and submit a pull request
  • See CONTRIBUTING.md (to be added)
  • Open an issue for bugs or feature requests

Where do I get support?

  • Open a GitHub issue
  • Contact the project maintainers via GitHub

Quick Links