NeuralScholar: ML/DL Concept Engine for AI-Assisted Research

A specialized small language model that serves as an ML/DL concept engine -- designed to research, explain, and generate ideas around machine learning and deep learning, which can then be handed off to large models (Claude, GPT, Gemini) for code implementation. Think of it as the "brain" that understands the theory, paired with large models as the "hands" that write the code.

Base Model: Qwen2.5-3B-Instruct Method: QLoRA (4-bit quantized LoRA fine-tuning) Domain: ML/DL concept research, ideation, and explanation

Why NeuralScholar?

Large language models like Claude, GPT, and Gemini are excellent at writing code -- but they work best when given precise, well-articulated ML/DL concepts as input. NeuralScholar fills that gap.

The Problem

When building ML/DL systems, the bottleneck is rarely the code -- it's knowing what to build and why. Practitioners often spend hours reading papers, comparing architectures, and understanding the mathematical intuition before writing a single line of code. General-purpose LLMs can help, but they lack the deep, focused expertise of a domain specialist.

The Solution: A Two-Stage AI Workflow

                    STAGE 1: THINK                          STAGE 2: BUILD
            ┌─────────────────────────┐            ┌─────────────────────────┐
            │     NeuralScholar       │            │   Claude / GPT / Gemini │
            │     (3B, local)         │     -->    │   (Large model, API)    │
            │                         │            │                         │
            │  "Explain how multi-    │            │  "Implement multi-head  │
            │   head attention works, │            │   attention with the    │
            │   compare it with       │            │   scaled dot-product    │
            │   cross-attention, and  │            │   approach, using       │
            │   when to use each"     │            │   PyTorch, with these   │
            │                         │            │   design choices..."    │
            │   --> Detailed concept  │            │                         │
            │      explanation with   │            │   --> Production-ready  │
            │      mathematical       │            │      implementation    │
            │      intuition          │            │                         │
            └─────────────────────────┘            └─────────────────────────┘
                Runs locally, free                   API call with precise context
                Private, no data leaks               Higher quality output
                Fast concept iteration               Less token waste

What This Enables

Research ideas locally -- Explore ML/DL concepts on your machine without API costs or data privacy concerns. Iterate on ideas freely before committing to expensive API calls.
Generate precise prompts for large models -- NeuralScholar's concept explanations serve as high-quality context for Claude/GPT/Gemini, resulting in better code output with fewer iterations.
Bridge theory and implementation -- Ask NeuralScholar "What loss function should I use for imbalanced multi-label classification and why?" -- then feed its answer to a large model with "Implement this in PyTorch."
Rapid literature review -- Trained on ~100K samples from ArXiv papers, StackExchange, Wikipedia, and curated datasets, the model provides grounded, up-to-date ML/DL knowledge.
Offline-first ML assistant -- Runs on a laptop GPU (8GB VRAM) via GGUF quantization. No internet required for concept exploration.

Example Workflow

You:            "Compare batch normalization vs layer normalization.
                 When should I use each in a transformer?"

NeuralScholar:  [Detailed explanation covering:
                 - Mathematical formulation of both
                 - Why LayerNorm is preferred in transformers (sequence length invariance)
                 - BatchNorm's dependency on batch statistics
                 - Pre-norm vs post-norm transformer variants
                 - Practical trade-offs]

You → Claude:   "Based on this analysis: [paste NeuralScholar output]
                 Implement a transformer block with pre-LayerNorm in PyTorch,
                 with an option to swap in RMSNorm."

Claude:         [Clean, production-ready PyTorch code informed by precise context]

Architecture
Project Structure
Setup
Usage
Data Pipeline
Model Details
Deployment

Architecture

DATA COLLECTION          PROCESSING              TRAINING            DEPLOYMENT

  ArXiv API        -->                      -->                -->  llama.cpp
  StackExchange    -->  Code Filter         -->  QLoRA         -->  Ollama
  Wikipedia        -->  Deduplication (LSH) -->  Fine-tuning   -->  Gradio UI
  Distill.pub      -->  Quality Filter      -->  on Qwen2.5-3B -->  Python API
  HuggingFace      -->  Train/Val Split     -->                -->

Project Structure

NeuralScholar/
├── configs/
│   └── config.yaml                  # Pipeline configuration
│
├── collectors/                      # Data collection modules
│   ├── arxiv_collector.py           # ArXiv papers (cs.LG, cs.CV, cs.CL, cs.AI)
│   ├── stackexchange_collector.py   # Stats & AI StackExchange Q&A
│   ├── wikipedia_collector.py       # ML/DL Wikipedia articles
│   ├── distill_collector.py         # Distill.pub research articles
│   └── huggingface_datasets_collector.py  # Open-Platypus, SciQ, ARC, Flan
│
├── processors/                      # Data processing pipeline
│   ├── code_filter.py               # Remove code-heavy content
│   ├── deduplicator.py              # MinHash LSH deduplication
│   ├── quality_filter.py            # Language, length, coherence checks
│   └── processors.py               # Instruction formatting & train/val split
│
├── utils/
│   ├── io_utils.py                  # JSONL streaming, config, progress tracking
│   └── text_utils.py               # Text cleaning, LaTeX removal, chunking
│
├── data/
│   ├── raw/                         # Collected data per source
│   ├── processed/                   # Intermediate processing stages
│   └── final/                       # Final train.jsonl & validation.jsonl
│
├── outputs/
│   ├── ml-slm-qwen3b/              # QLoRA checkpoints & final model
│   ├── ml-slm-merged/              # Merged LoRA + base (full HF model)
│   └── ml-slm-gguf/                # Quantized GGUF models
│
├── llama.cpp/                       # Submodule for GGUF conversion
│
├── train_qlora.py                   # QLoRA fine-tuning script
├── inference.py                     # Inference (interactive, eval, compare)
├── export_gguf.py                   # GGUF export & quantization
├── webui.py                         # Gradio web interface
├── analyze_dataset.py               # Dataset statistics & analysis
├── run_pipeline.py                  # Pipeline orchestration
├── requirements.txt                 # Data collection dependencies
└── requirements_training.txt        # Training & inference dependencies

Setup

Prerequisites

Python 3.10+
CUDA-capable GPU (8GB+ VRAM recommended)
llama.cpp (for GGUF export only)

Installation

# Clone the repository
git clone https://github.com/<your-username>/NeuralScholar.git
cd NeuralScholar

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install data collection & processing dependencies
pip install -r requirements.txt
python -m spacy download en_core_web_sm

# Install training & inference dependencies
pip install -r requirements_training.txt

Usage

Full Pipeline

# Run everything: collect, process, finalize
python run_pipeline.py --full

# Skip collection, only process & split
python run_pipeline.py --full --skip-collection

1. Data Collection

Collect from individual sources or all at once:

# All sources
python run_pipeline.py collect --all

# Individual sources
python run_pipeline.py collect --source arxiv
python run_pipeline.py collect --source stackexchange
python run_pipeline.py collect --source wikipedia
python run_pipeline.py collect --source distill
python run_pipeline.py collect --source huggingface

Sources and scale:

Source	Content	Samples
ArXiv	Paper summaries & explanations (2018+)	~95K
Wikipedia	40+ ML/DL topic articles	~4.5K
StackExchange	Stats & AI forum Q&A (score >= 5)	~1K
HuggingFace	Open-Platypus, SciQ, ARC, Flan-CoT	~2K
Distill.pub	Interactive research explanations	~400

2. Data Processing

# Run processing pipeline: merge → code filter → dedup → quality → format → split
python run_pipeline.py process
python run_pipeline.py finalize

# Analyze the final dataset
python analyze_dataset.py

Processing stages:

Stage	Method	Purpose
Code Filter	Pattern-based detection	Remove code snippets, preserve algorithm descriptions
Deduplication	MinHash LSH (threshold: 0.85)	Remove near-duplicate samples
Quality Filter	Language, length, coherence checks	Ensure minimum quality (50-2048 tokens, English only)
Formatting	Template standardization	Uniform instruction-output format
Finalization	Stratified shuffle split (95/5)	Train/validation split with seed 42

3. Training

# Quick test run (100 samples, 1 epoch)
python train_qlora.py --test

# Full training
python train_qlora.py

# Resume from checkpoint
python train_qlora.py --resume

# Override hyperparameters
python train_qlora.py --epochs 2 --lr 1e-4 --output outputs/custom-run

4. Inference

# Interactive chat
python inference.py

# Single question
python inference.py --question "Explain the attention mechanism in transformers"

# Evaluate on benchmark questions
python inference.py --eval

# Compare fine-tuned vs base model
python inference.py --compare --question "Why does dropout help prevent overfitting?"

5. GGUF Export

# Export with recommended quantization (q4_k_m)
python export_gguf.py

# Choose quantization format
python export_gguf.py --quant q5_k_m

Available quantization formats:

Format	Size	Quality	Use Case
`q4_k_m`	~2.5 GB	Good	Recommended for 8GB VRAM
`q5_k_m`	~3.5 GB	Better	Higher quality, more VRAM
`q6_k`	~4 GB	Very Good	Large systems
`q8_0`	~5 GB	Excellent	Best quality quantized
`f16`	~6 GB	Best	Full precision

6. Web UI

python webui.py
# Opens Gradio interface at http://localhost:7860

Data Pipeline

ArXiv / StackExchange / Wikipedia / Distill / HuggingFace
                        |
                        v
              data/raw/[source]/*.jsonl
                        |
                  merge_all_sources
                        |
                        v
              data/processed/merged.jsonl
                        |
          code_filter -> dedup -> quality -> format
                        |
                        v
              data/processed/formatted.jsonl
                        |
                shuffle + split (95/5)
                        |
              +---------+---------+
              |                   |
              v                   v
    data/final/train.jsonl  data/final/validation.jsonl

Data format (each line in JSONL):

{
  "instruction": "Explain the concept of attention in transformers.",
  "input": "",
  "output": "Attention is a mechanism that allows a model to focus on...",
  "source": "arxiv",
  "category": "concept_explanation/architectures"
}

Model Details

Training Configuration

Parameter	Value
Base model	Qwen2.5-3B-Instruct
Method	QLoRA (4-bit NF4, double quantization)
LoRA rank (r)	64
LoRA alpha	128
LoRA target modules	q, k, v, o, gate, up, down projections
Trainable parameters	~120M (6.58% of 3B)
Max sequence length	1024 tokens
Epochs	3
Effective batch size	16 (1 x 16 gradient accumulation)
Learning rate	2e-4 (cosine schedule, 3% warmup)
Optimizer	Paged AdamW 8-bit
Precision	BFloat16
VRAM requirement	~7.5 GB

System Prompt

The model is trained with the following system prompt enforced at both training and inference:

You are an expert ML/DL teaching assistant. Explain concepts clearly and accurately, use intuitive analogies, provide mathematical intuition, compare and contrast methods, and focus on conceptual understanding. Do NOT write code.

Deployment

With llama.cpp

./llama-cli -m outputs/ml-slm-gguf/ml-slm-q4_k_m.gguf \
  -p "Explain the attention mechanism in transformers"

With Ollama

echo 'FROM ./outputs/ml-slm-gguf/ml-slm-q4_k_m.gguf' > Modelfile
ollama create neural-scholar -f Modelfile
ollama run neural-scholar

With Python (transformers + peft)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "outputs/ml-slm-qwen3b/final")
tokenizer = AutoTokenizer.from_pretrained("outputs/ml-slm-qwen3b/final")

Pairing with Large Models (Recommended Workflow)

# Step 1: Get concept explanation from NeuralScholar (local, free)
concept = neural_scholar.generate("Explain how LoRA works and why it's parameter-efficient")

# Step 2: Feed concept to Claude/GPT for implementation (API call with rich context)
prompt = f"""Based on this technical explanation:

{concept}

Implement a LoRA layer in PyTorch that can wrap any nn.Linear module.
Include forward pass, weight merging, and rank selection."""

code = claude.generate(prompt)  # One precise API call instead of many vague ones

License

This project is for educational and research purposes. The base model (Qwen2.5-3B-Instruct) is subject to its own license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeuralScholar: ML/DL Concept Engine for AI-Assisted Research

Why NeuralScholar?

The Problem

The Solution: A Two-Stage AI Workflow

What This Enables

Example Workflow

Table of Contents

Architecture

Project Structure

Setup

Prerequisites

Installation

Usage

Full Pipeline

1. Data Collection

2. Data Processing

3. Training

4. Inference

5. GGUF Export

6. Web UI

Data Pipeline

Model Details

Training Configuration

System Prompt

Deployment

With llama.cpp

With Ollama

With Python (transformers + peft)

Pairing with Large Models (Recommended Workflow)

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

NeuralScholar: ML/DL Concept Engine for AI-Assisted Research

Why NeuralScholar?

The Problem

The Solution: A Two-Stage AI Workflow

What This Enables

Example Workflow

Table of Contents

Architecture

Project Structure

Setup

Prerequisites

Installation

Usage

Full Pipeline

1. Data Collection

2. Data Processing

3. Training

4. Inference

5. GGUF Export

6. Web UI

Data Pipeline

Model Details

Training Configuration

System Prompt

Deployment

With llama.cpp

With Ollama

With Python (transformers + peft)

Pairing with Large Models (Recommended Workflow)

License