refactor: reorganize examples into structured folders

CodeWithKyrian · CodeWithKyrian · commit bc27f555e0eb · 2025-12-06T20:14:17.000+01:00
diff --git a/README.md b/README.md
@@ -475,6 +475,11 @@ $collection->delete(where: Where::field('category')->eq('outdated'));
 $collection->delete(whereDocument: Where::document()->contains('outdated'));
 ```
 
+## Examples
+
+- **[`basic-usage`](examples/basic-usage)** - Simple example demonstrating basic operations: connecting, adding documents, and querying
+- **[`document-chunking-cloud`](examples/document-chunking-cloud)** - Document chunking, embedding, and storage in Chroma Cloud with semantic search
+
 ## Testing
 
 Run the test suite using Pest.
diff --git a/examples/basic-usage/index.php b/examples/basic-usage/index.php
@@ -2,7 +2,7 @@
 
 declare(strict_types=1);
 
-require __DIR__ . '/../vendor/autoload.php';
+require __DIR__ . '/../../vendor/autoload.php';
 
 use Codewithkyrian\ChromaDB\ChromaDB;
 use Codewithkyrian\ChromaDB\Embeddings\JinaEmbeddingFunction;
@@ -21,9 +21,9 @@
 );
 
 $items = [
-    ["id" => 1, "content" => "He seems very happy" ],
-    ["id" => 2, "content"=> "He was very sad when we last talked"],
-    ["id" => 3, "content"=> "She made him angry"],
+    ["id" => 1, "content" => "He seems very happy"],
+    ["id" => 2, "content" => "He was very sad when we last talked"],
+    ["id" => 3, "content" => "She made him angry"],
 ];
 
 $collection->add(
@@ -37,5 +37,3 @@
 );
 
 dd($queryResponse->documents[0], $queryResponse->distances[0]);
-
-
diff --git a/examples/document-chunking-cloud/README.md b/examples/document-chunking-cloud/README.md
@@ -0,0 +1,198 @@
+# Document Chunking and Embedding Example
+
+This example demonstrates how to chunk a document, generate embeddings, and store them in Chroma Cloud for semantic search and retrieval.
+
+## Overview
+
+The example performs the following operations:
+
+1. **Ingestion Mode**: Chunks a document (`document.txt`) into smaller pieces, generates embeddings using Jina AI, and stores them in Chroma Cloud
+2. **Query Mode**: Performs semantic search on the stored documents using natural language queries
+
+## Prerequisites
+
+- PHP 8.1 or higher
+- Chroma Cloud account with API key
+- Jina AI API key (for embeddings)
+- Composer dependencies installed (`composer install`)
+
+## Setup
+
+1. Set your API keys as environment variables:
+
+```bash
+export CHROMA_API_KEY="your-chroma-cloud-api-key"
+export JINA_API_KEY="your-jina-api-key"
+```
+
+Or pass them via CLI arguments (see Usage below).
+
+## Usage
+
+### Ingest Mode
+
+Chunk and store the document to Chroma Cloud:
+
+```bash
+php index.php -mode ingest
+```
+
+With custom options:
+
+```bash
+php index.php -mode ingest \
+  --api-key "your-chroma-api-key" \
+  --jina-key "your-jina-api-key" \
+  --tenant "my-tenant" \
+  --database "my-database"
+```
+
+### Query Mode
+
+Search the stored documents:
+
+```bash
+php index.php -mode query --query "What happened at the Dartmouth Workshop?"
+```
+
+With custom options:
+
+```bash
+php index.php -mode query \
+  --query "Who proposed the Turing Test?" \
+  --api-key "your-chroma-api-key" \
+  --jina-key "your-jina-api-key" \
+  --tenant "my-tenant" \
+  --database "my-database"
+```
+
+## CLI Arguments
+
+| Argument | Description | Default | Required |
+|----------|-------------|---------|----------|
+| `-mode` | Operation mode: `ingest` or `query` | - | Yes |
+| `--query` | Query text for search (query mode only) | "Which event marked the birth of symbolic AI?" | No |
+| `--api-key` | Chroma Cloud API key | `CHROMA_API_KEY` env var | Yes |
+| `--jina-key` | Jina AI API key for embeddings | `JINA_API_KEY` env var | Yes |
+| `--tenant` | Chroma Cloud tenant name | `default_tenant` | No |
+| `--database` | Chroma Cloud database name | `default_database` | No |
+| `--collection-name` | Collection name to use | `history_of_ai` | No |
+
+## Example Queries
+
+Try these example queries to test the semantic search:
+
+```bash
+# Historical events
+php index.php -mode query --query "What happened at the Dartmouth Workshop?"
+
+# People and contributions
+php index.php -mode query --query "Who proposed the Turing Test?"
+
+# Technical breakthroughs
+php index.php -mode query --query "What was the significance of AlexNet in 2012?"
+
+# Concepts and explanations
+php index.php -mode query --query "How do Large Language Models and Generative AI work?"
+
+# Historical figures
+php index.php -mode query --query "Who is considered the first computer programmer?"
+```
+
+## How It Works
+
+### Document Chunking
+
+The document is chunked based on:
+- **CHAPTER markers**: New chapters create new chunks
+- **PAGE markers**: New pages create new chunks
+- **Text accumulation**: Text between markers is accumulated into chunks
+
+Each chunk includes:
+- Unique ID
+- Document text
+- Metadata (chapter and page information)
+
+### Embedding Generation
+
+- Uses Jina AI's embedding function to convert text chunks into vector embeddings
+- Embeddings are generated in batch for efficiency
+- All chunks are embedded before storage
+
+### Storage
+
+- Chunks are stored in a Chroma Cloud collection
+- The collection is recreated on each ingestion (previous data is deleted)
+- Each chunk maintains its metadata for filtering and context
+
+### Querying
+
+- Natural language queries are converted to embeddings using the same Jina AI function
+- Vector similarity search finds the most relevant chunks
+- Results include distance scores, documents, and metadata
+
+## Output
+
+### Ingest Mode
+
+```
+--- Chroma Cloud Example: ingest Mode ---
+Tenant: default_tenant, Database: default_database
+Connected to Chroma Cloud version: 0.1.0
+Starting Ingestion...
+Parsed 9 chunks from document.
+Embedding and adding 9 items...
+Ingestion Complete!
+```
+
+### Query Mode
+
+```
+--- Chroma Cloud Example: query Mode ---
+Tenant: default_tenant, Database: default_database
+Connected to Chroma Cloud version: 0.1.0
+Querying: "What happened at the Dartmouth Workshop?"
+
+--- Results ---
+[0] (Distance: 0.123)
+Location: CHAPTER 1: The Dawn of Thinking Machines, PAGE 3
+Content: The 1956 Dartmouth Workshop is widely considered the founding event of AI as a field. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon brought together...
+---------------------------
+```
+
+## Customization
+
+### Using a Different Document
+
+Replace `document.txt` with your own document. The chunking logic will automatically process it based on CHAPTER and PAGE markers.
+
+### Using a Different Embedding Function
+
+Modify `index.php` to use a different embedding function:
+
+```php
+use Codewithkyrian\ChromaDB\Embeddings\OpenAIEmbeddingFunction;
+
+$ef = new OpenAIEmbeddingFunction($config['openai_key']);
+```
+
+### Custom Chunking Strategy
+
+Modify the `chunkDocument()` function to implement your own chunking logic (e.g., by sentence, by paragraph, fixed-size chunks, etc.).
+
+## Troubleshooting
+
+**Error: Chroma Cloud API Key is required**
+- Set `CHROMA_API_KEY` environment variable or use `--api-key` argument
+
+**Error: Jina API Key is required**
+- Set `JINA_API_KEY` environment variable or use `--jina-key` argument
+
+**Error: Collection not found**
+- Run ingestion mode first to create and populate the collection
+
+**No results returned**
+- Ensure the collection was successfully ingested
+- Try different query phrasings
+- Check that the query is related to the document content
+
diff --git a/examples/document-chunking-cloud/document.txt b/examples/document-chunking-cloud/document.txt
@@ -0,0 +1,25 @@
+THE EVOLUTION OF ARTIFICIAL INTELLIGENCE
+
+CHAPTER 1: The Dawn of Thinking Machines
+PAGE 1
+The quest to create machines that can think is as old as storytelling itself. From the automatons of Greek mythology to the Golems of Jewish folklore, humanity has always dreamed of breathing life into the inanimate. However, it wasn't until the 20th century that the mathematical foundations for Artificial Intelligence were laid. Ada Lovelace, often considered the first computer programmer, speculated that the Analytical Engine might act upon other things besides numbers.
+PAGE 2
+In 1950, Alan Turing proposed the famous "Turing Test" as a measure of machine intelligence. He asked, "Can machines think?" and suggested that if a machine could converse with a human without being distinguished from another human, it could be said to "think". This period marked the birth of symbolic AI, where researchers believed that intelligence could be reduced to symbol manipulation.
+PAGE 3
+The 1956 Dartmouth Workshop is widely considered the founding event of AI as a field. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon brought together researchers to discuss "thinking machines". Optimism was high; Minsky famously predicted that within a generation, the problem of creating 'artificial intelligence' would be substantially solved.
+
+CHAPTER 2: Deep Learning and Neural Networks
+PAGE 1
+While early AI focused on logic and rules, another approach was brewing: connectionism. Inspired by the human brain, artificial neural networks aimed to learn from data rather than following hard-coded instructions. The Perceptron, developed by Frank Rosenblatt in 1958, was an early model of a single neuron, capable of simple binary classification.
+PAGE 2
+However, neural networks faced a "winter" in the 1970s and 80s due to computational limitations and the inability to train deep networks. It wasn't until the mid-2000s, with the advent of powerful GPUs and big data, that "Deep Learning" re-emerged. Researchers like Geoffrey Hinton showed that multi-layered networks could learn complex patterns, leading to breakthroughs in image and speech recognition.
+PAGE 3
+The turning point came in 2012 with AlexNet, a deep convolutional neural network that dominated the ImageNet competition. This victory demonstrated the undeniable power of deep learning, sparking an explosion of investment and research. Suddenly, computers could see, hear, and translate languages with near-human accuracy.
+
+CHAPTER 3: The Generative Era
+PAGE 1
+In the 2020s, AI shifted from merely analyzing data to creating it. Generative AI, powered by architectures like the Transformer (introduced by Google in 2017), enabled models to understand and generate human-like text. The concept of "Attention" allowed these models to weigh the importance of different words in a sentence, capturing context like never before.
+PAGE 2
+Large Language Models (LLMs) like GPT-3 and GPT-4 demonstrated emergent abilities. They could write code, compose poetry, solve math problems, and even reason through complex tasks. This era also saw the rise of diffusion models in image generation, allowing users to create stunning visual art from simple text prompts.
+PAGE 3
+As we stand on the brink of Artificial General Intelligence (AGI), the focus shifts to alignment and safety. Ensuring that these powerful systems act in accordance with human values is the defining challenge of our time. The journey from the Dartmouth Workshop to ChatGPT has been long, but in many ways, it is just beginning.
diff --git a/examples/document-chunking-cloud/index.php b/examples/document-chunking-cloud/index.php