Skip to content

Commit 9977891

Browse files
committed
Update pyproject file and docker files
1 parent 958a5ff commit 9977891

File tree

4 files changed

+69
-12
lines changed

4 files changed

+69
-12
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ LABEL maintainer="RAFT Toolkit Team" \
1919
org.opencontainers.image.version="${VERSION:-0.2.0}" \
2020
org.opencontainers.image.created="${BUILD_DATE}" \
2121
org.opencontainers.image.revision="${VCS_REF}" \
22-
org.opencontainers.image.source="https://github.com/microsoft/raft-toolkit" \
22+
org.opencontainers.image.source="https://github.com/makercorn/raft-toolkit" \
2323
org.opencontainers.image.licenses="MIT"
2424

2525
# Set environment variables

Dockerfile.windows

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ LABEL maintainer="RAFT Toolkit Team" \
2020
org.opencontainers.image.version="${VERSION:-0.2.0}" \
2121
org.opencontainers.image.created="${BUILD_DATE}" \
2222
org.opencontainers.image.revision="${VCS_REF}" \
23-
org.opencontainers.image.source="https://github.com/microsoft/raft-toolkit" \
23+
org.opencontainers.image.source="https://github.com/makercorn/raft-toolkit" \
2424
org.opencontainers.image.licenses="MIT"
2525

2626
# Set environment variables

README.md

Lines changed: 64 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@
1212
- [📦 Installation](#-installation)
1313
- [Quick Start](#quick-start)
1414
- [Installation Options](#installation-options)
15+
- [🚀 **Core Installation** (Fastest - ~30-60 seconds)](#-core-installation-fastest---30-60-seconds)
16+
- [📊 **Standard Installation** (Recommended)](#-standard-installation-recommended)
17+
- [🌐 **Complete Installation**](#-complete-installation)
18+
- [🛠️ **Development Installation**](#️-development-installation)
19+
- [🎯 **Custom Combinations**](#-custom-combinations)
20+
- [🐳 **Docker Installation**](#-docker-installation)
1521
- [🌐 Usage](#-usage)
1622
- [Web Interface](#web-interface)
1723
- [Command Line Interface](#command-line-interface)
@@ -64,10 +70,24 @@
6470
- [Docker Testing](#docker-testing)
6571
- [Code Quality](#code-quality)
6672
- [Security Scanning](#security-scanning)
73+
- [🛠️ Command Line Tools](#️-command-line-tools)
74+
- [Available Tools](#available-tools)
75+
- [Quick Examples](#quick-examples)
76+
- [Complete Workflow](#complete-workflow)
6777
- [🛠️ Fine-tuning \& Evaluation](#️-fine-tuning--evaluation)
6878
- [Model Fine-tuning](#model-fine-tuning)
69-
- [Evaluation Tools](#evaluation-tools)
79+
- [Legacy Tool Usage](#legacy-tool-usage)
7080
- [🚀 Deployment](#-deployment)
81+
- [📚 Documentation](#-documentation)
82+
- [Getting Started](#getting-started)
83+
- [Architecture \& Design](#architecture--design)
84+
- [Usage \& Reference](#usage--reference)
85+
- [Development \& Testing](#development--testing)
86+
- [Deployment \& Operations](#deployment--operations)
87+
- [Releases \& Changes](#releases--changes)
88+
- [Technical Guides](#technical-guides)
89+
- [Troubleshooting \& Fixes](#troubleshooting--fixes)
90+
- [Other Documentation](#other-documentation)
7191

7292
## 🚀 Overview
7393

@@ -149,13 +169,15 @@ graph TD
149169
**When to Use RAFT vs Traditional RAG:**
150170

151171
**Use RAFT Fine-Tuning When:**
172+
152173
- You have consistent document types/formats
153174
- Performance on document reasoning is critical
154175
- You can invest time in data generation and training
155176
- You need predictable, high-quality outputs
156177
- Latency optimization is important
157178

158179
**Use Traditional RAG When:**
180+
159181
- Working with diverse, changing document types
160182
- Quick prototyping or proof-of-concept needed
161183
- Limited resources for training data generation
@@ -192,34 +214,43 @@ python -m cli.main --datapath sample_data/sample.pdf --output ./output --preview
192214
Choose the installation that best fits your needs:
193215

194216
#### 🚀 **Core Installation** (Fastest - ~30-60 seconds)
217+
195218
```bash
196219
pip install .
197220
```
221+
198222
**Includes:** Basic CLI, document processing, OpenAI integration
199223
**Use cases:** Quick testing, lightweight deployments, basic CI
200224

201225
#### 📊 **Standard Installation** (Recommended)
226+
202227
```bash
203228
pip install .[standard]
204229
```
230+
205231
**Includes:** Full AI/ML functionality, embeddings, LangChain ecosystem
206232
**Use cases:** Production deployments, full RAFT functionality
207233

208234
#### 🌐 **Complete Installation**
235+
209236
```bash
210237
pip install .[complete]
211238
```
239+
212240
**Includes:** Standard + cloud services + observability
213241
**Use cases:** Enterprise deployments, cloud integration
214242

215243
#### 🛠️ **Development Installation**
244+
216245
```bash
217246
pip install .[all]
218247
```
248+
219249
**Includes:** Everything + development tools
220250
**Use cases:** Contributing, local development, full testing
221251

222252
#### 🎯 **Custom Combinations**
253+
223254
```bash
224255
# Web interface with AI
225256
pip install .[standard,web]
@@ -232,17 +263,20 @@ pip install .[standard,dev]
232263
```
233264

234265
#### 🐳 **Docker Installation**
266+
235267
```bash
236268
docker compose up -d
237269
```
238270

239271
**🚀 Performance Note:** The optimized dependency structure provides **70-80% faster CI builds** compared to previous versions. See [CI Optimization Guide](docs/CI_OPTIMIZATION.md) for details.
240272

241273
**📚 Installation Resources:**
274+
242275
- [Complete Installation Guide](docs/INSTALLATION_GUIDE.md) - Detailed setup instructions
243276
- [Requirements Management](docs/REQUIREMENTS.md) - Dependency structure and installation patterns
244277

245278
**📚 CLI Documentation:**
279+
246280
- [CLI Reference Guide](docs/CLI-Reference.md) - Comprehensive CLI parameter documentation
247281
- [CLI Quick Reference](docs/CLI-Quick-Reference.md) - Quick reference card for CLI parameters
248282

@@ -263,6 +297,7 @@ python run_web.py --host 0.0.0.0 --port 8080 --debug
263297
```
264298

265299
**Web UI Features:**
300+
266301
- 📤 **Dataset Generation**: Drag & drop file upload with visual configuration
267302
- 🛠️ **Analysis Tools**: Six powerful evaluation and analysis tools
268303
- ⚙️ **Visual Configuration**: Interactive forms for all settings
@@ -272,6 +307,7 @@ python run_web.py --host 0.0.0.0 --port 8080 --debug
272307
- 📈 **Results Visualization**: Comprehensive display of metrics and statistics
273308

274309
**Analysis Tools Available:**
310+
275311
- **Dataset Evaluation**: Evaluate model performance with configurable metrics
276312
- **Answer Generation**: Generate high-quality answers using various LLMs
277313
- **PromptFlow Analysis**: Multi-dimensional evaluation (relevance, groundedness, fluency, coherence)
@@ -314,6 +350,7 @@ See the [tools/README.md](tools/README.md) for comprehensive documentation on al
314350
4. **Dataset Export**: Data is saved in the specified format for fine-tuning
315351

316352
**Tips:**
353+
317354
- Use a `.env` file for OpenAI/Azure keys
318355
- For Azure, set deployment names with `--completion-model` and `--embedding-model`
319356
- Use `--chunking-strategy` and `--chunking-params` for best results on your data
@@ -551,6 +588,7 @@ RAFT Toolkit includes a comprehensive template system for customizing prompts us
551588
### Default Template Behavior
552589

553590
**No Configuration Required**: RAFT Toolkit works out of the box with intelligent defaults:
591+
554592
- Automatically selects appropriate templates based on model type (GPT, Llama, etc.)
555593
- Provides robust fallback mechanisms if custom templates are not found
556594
- Includes multiple layers of default templates for different complexity levels
@@ -564,12 +602,14 @@ python raft.py --datapath docs/ --output training_data/
564602
### Available Templates
565603

566604
#### Embedding Templates
605+
567606
- **`embedding_prompt_template.txt`**: Default template for embedding generation
568607
- Provides context and instructions for generating document embeddings
569608
- Supports variables: `{content}`, `{document_type}`, `{metadata}`
570609
- Customizable for domain-specific embedding optimization
571610

572611
#### Question-Answer Generation Templates
612+
573613
- **`gpt_template.txt`**: GPT-style question-answering template with reasoning and citations
574614
- **`gpt_qa_template.txt`**: GPT question generation template with content filtering
575615
- **`llama_template.txt`**: Llama-style question-answering template optimized for Llama models
@@ -578,6 +618,7 @@ python raft.py --datapath docs/ --output training_data/
578618
### Template Configuration
579619

580620
**Environment Variables:**
621+
581622
```bash
582623
# Custom prompt templates
583624
export RAFT_EMBEDDING_PROMPT_TEMPLATE="/path/to/templates/my_embedding_template.txt"
@@ -589,6 +630,7 @@ export RAFT_TEMPLATES="/path/to/templates/"
589630
```
590631

591632
**CLI Arguments:**
633+
592634
```bash
593635
# Use custom templates
594636
python raft.py --datapath docs/ --output training_data/ \
@@ -602,6 +644,7 @@ python raft.py --datapath docs/ --output training_data/ \
602644
```
603645

604646
**Programmatic Configuration:**
647+
605648
```python
606649
config = RAFTConfig(
607650
templates="./templates",
@@ -614,21 +657,24 @@ config = RAFTConfig(
614657
### Template Variables
615658

616659
#### Embedding Templates
660+
617661
- `{content}`: The document content to be embedded
618662
- `{document_type}`: File type (pdf, txt, json, pptx, etc.)
619663
- `{metadata}`: Additional document metadata
620664
- `{chunk_index}`: Index of the current chunk within the document
621665
- `{chunking_strategy}`: The chunking method used
622666

623667
#### QA Generation Templates
668+
624669
- `{question}`: The question to be answered (for answer templates)
625670
- `{context}`: The context/chunk for question generation
626671
- `%s`: Placeholder for number of questions to generate
627672

628673
### Domain-Specific Examples
629674

630675
#### Medical Documents
631-
```
676+
677+
```text
632678
Generate embeddings for medical literature that capture:
633679
- Clinical terminology and procedures
634680
- Drug names and dosages
@@ -639,7 +685,8 @@ Content: {content}
639685
```
640686

641687
#### Legal Documents
642-
```
688+
689+
```text
643690
Generate embeddings for legal documents focusing on:
644691
- Legal terminology and concepts
645692
- Case citations and precedents
@@ -651,7 +698,8 @@ Content: {content}
651698
```
652699

653700
#### Technical Documentation
654-
```
701+
702+
```text
655703
Generate embeddings for technical documentation emphasizing:
656704
- API endpoints and parameters
657705
- Code examples and syntax
@@ -673,12 +721,14 @@ The RAFT Toolkit includes comprehensive rate limiting to handle the constraints
673721
#### Why Rate Limiting Matters
674722

675723
**Common Issues Without Rate Limiting:**
724+
676725
- API rate limit errors (HTTP 429) causing processing failures
677726
- Unexpected costs from burst API usage
678727
- Inconsistent processing times due to throttling
679728
- Failed batches requiring expensive reprocessing
680729

681730
**Benefits of Rate Limiting:**
731+
682732
- **Predictable Costs**: Control API spending with token and request limits
683733
- **Reliable Processing**: Avoid rate limit errors through intelligent throttling
684734
- **Optimized Performance**: Adaptive strategies adjust to service response times
@@ -687,6 +737,7 @@ The RAFT Toolkit includes comprehensive rate limiting to handle the constraints
687737
#### Quick Start Examples
688738

689739
**Using Preset Configurations:**
740+
690741
```bash
691742
# OpenAI GPT-4 with recommended limits
692743
python raft.py --datapath docs/ --output training_data/ \
@@ -702,6 +753,7 @@ python raft.py --datapath docs/ --output training_data/ \
702753
```
703754

704755
**Custom Rate Limiting:**
756+
705757
```bash
706758
# Custom limits for your specific API tier
707759
python raft.py --datapath docs/ --output training_data/ \
@@ -757,6 +809,7 @@ The RAFT Toolkit features a comprehensive logging system designed for production
757809
#### 🚀 **Production Deployment**
758810

759811
**Docker with Enhanced Logging:**
812+
760813
```yaml
761814
# docker-compose.yml
762815
version: '3.8'
@@ -772,6 +825,7 @@ services:
772825
```
773826
774827
**Kubernetes ConfigMap:**
828+
775829
```yaml
776830
apiVersion: v1
777831
kind: ConfigMap
@@ -784,7 +838,6 @@ data:
784838
RAFT_LOG_STRUCTURED: "true"
785839
```
786840
787-
788841
### File Utilities
789842
790843
- **Split large JSONL files:**
@@ -861,7 +914,7 @@ raft-toolkit/
861914

862915
This toolkit follows **12-factor app principles** with a modular architecture:
863916

864-
```
917+
``` text
865918
raft-toolkit/
866919
├── raft_toolkit/ # Main package
867920
│ ├── core/ # Shared business logic
@@ -878,6 +931,7 @@ raft-toolkit/
878931
```
879932

880933
**Benefits:**
934+
881935
-**Separation of Concerns**: UI and business logic decoupled
882936
-**Environment Parity**: Same code for dev/prod
883937
-**Configuration via Environment**: 12-factor compliance
@@ -1078,6 +1132,7 @@ python answer.py --input questions.jsonl --output answers.jsonl --model gpt-4
10781132
```
10791133

10801134
**Evaluation Metrics:**
1135+
10811136
- **Relevance**: How relevant is the answer to the question?
10821137
- **Groundedness**: Is the answer grounded in the provided context?
10831138
- **Fluency**: How fluent and natural is the language?
@@ -1097,6 +1152,7 @@ python answer.py --input questions.jsonl --output answers.jsonl --model gpt-4
10971152
- **🔒 Security**: Container scanning, network policies, secret management
10981153

10991154
**Local Development:**
1155+
11001156
```bash
11011157
# Development mode with auto-reload
11021158
python run_web.py --debug
@@ -1164,4 +1220,5 @@ See the [Deployment Guide](docs/DEPLOYMENT.md) for comprehensive deployment inst
11641220

11651221
### Other Documentation
11661222

1167-
- [Test Coverage Summary](docs/TEST_COVERAGE_SUMMARY.md)
1223+
- [Test Coverage Summary](docs/TEST_COVERAGE_SUMMARY.md)
1224+

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -160,9 +160,9 @@ raft-pfeval-completion = "raft_toolkit.tools.pfeval_completion:main"
160160
raft-pfeval-local = "raft_toolkit.tools.pfeval_local:main"
161161

162162
[project.urls]
163-
"Homepage" = "https://github.com/microsoft/raft-toolkit"
164-
"Bug Reports" = "https://github.com/microsoft/raft-toolkit/issues"
165-
"Source" = "https://github.com/microsoft/raft-toolkit"
163+
"Homepage" = "https://visland.com"
164+
"Bug Reports" = "https://github.com/makercorn/raft-toolkit/issues"
165+
"Source" = "https://github.com/makercorn/raft-toolkit"
166166

167167
[tool.setuptools.packages.find]
168168
include = ["raft_toolkit*"]

0 commit comments

Comments
 (0)