-
Notifications
You must be signed in to change notification settings - Fork 0
Scaffold AI Frequently Asked Questions (FAQ)
Kevin Mastascusa edited this page Jun 30, 2025
·
2 revisions
Last updated: 2025-06-30
Scaffold AI is a curriculum recommendation tool designed to help educators integrate sustainability and climate resilience topics into academic programs using state-of-the-art AI models, including retrieval-augmented generation (RAG), semantic search, and LLMs.
Educators, curriculum designers, and researchers seeking literature-backed, transparent curriculum recommendations for sustainability and engineering education.
- Python 3.11+ (3.11 recommended)
- 16GB+ RAM
- NVIDIA GPU (recommended but not required)
- Windows, Linux, or macOS
- Clone the repo:
git clone https://github.com/kevinmastascusa/scaffold_ai.git - Create & activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt - Run the setup script:
python setup.py - Place your PDF files in the
data/directory.
See Local Setup Guide for detailed steps.
Yes, for most LLM models.
- Get your token at https://huggingface.co/settings/tokens
- Add it to your environment or
.envfile asHUGGINGFACE_TOKEN=your_token_here
See Hugging Face Migration Guide.
- PDFs are split into page-based chunks (one chunk per complete page).
- Text is cleaned, Unicode-normalized, and analyzed for technical terms.
- Math-aware and Unicode-aware chunking is available for advanced use.
PDF extraction can merge words (e.g., environmentalsustainability).
- The pipeline detects and splits these using domain-specific rules and wordninja.
- See outputs/combined_words_analysis_report.txt.
- Each chunk is embedded using
all-MiniLM-L6-v2(sentence-transformers). - Chunks are indexed using FAISS for efficient vector search.
- Queries are embedded and matched to relevant chunks.
- Top matches are reranked with a cross-encoder.
- The LLM generates a grounded answer using only retrieved content.
- Citations to source documents are included (citation layer in progress).
- Default:
mistralai/Mistral-7B-Instruct-v0.2(Hugging Face) - Alternatives: OpenHermes, TinyLlama, others (see model_summary.md)
Yes!
- Edit
LLM_MODELinscaffold_core/config.py. - Make sure the model supports text-generation and you have access.
Yes.
- Run
python scaffold_core/scripts/run_tests.pyfor comprehensive tests. - Generate a detailed report with
python scaffold_core/scripts/generate_test_report.py. - See documentation/query_system_test_report.md.
- Use a smaller model (e.g., TinyLlama)
- Reduce batch size or max_length in
config.py - Run on CPU if GPU memory is insufficient
- Check your Hugging Face token and model access permissions
- Request access to gated models as described in the migration guide.
- Make sure you have internet connectivity
A pilot UI for querying and feedback is planned for the early testing phase.
- Track progress in GitHub Issue #8.
- Fork the repo and submit a pull request
- See
CONTRIBUTING.md(to be added) - Open an issue for bugs or feature requests
- Open a GitHub issue
- Contact the project maintainers via GitHub