Skip to content

Commit bc27f55

Browse files
refactor: reorganize examples into structured folders
1 parent acf3c7a commit bc27f55

File tree

5 files changed

+408
-6
lines changed

5 files changed

+408
-6
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,11 @@ $collection->delete(where: Where::field('category')->eq('outdated'));
475475
$collection->delete(whereDocument: Where::document()->contains('outdated'));
476476
```
477477

478+
## Examples
479+
480+
- **[`basic-usage`](examples/basic-usage)** - Simple example demonstrating basic operations: connecting, adding documents, and querying
481+
- **[`document-chunking-cloud`](examples/document-chunking-cloud)** - Document chunking, embedding, and storage in Chroma Cloud with semantic search
482+
478483
## Testing
479484

480485
Run the test suite using Pest.
Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
declare(strict_types=1);
44

5-
require __DIR__ . '/../vendor/autoload.php';
5+
require __DIR__ . '/../../vendor/autoload.php';
66

77
use Codewithkyrian\ChromaDB\ChromaDB;
88
use Codewithkyrian\ChromaDB\Embeddings\JinaEmbeddingFunction;
@@ -21,9 +21,9 @@
2121
);
2222

2323
$items = [
24-
["id" => 1, "content" => "He seems very happy" ],
25-
["id" => 2, "content"=> "He was very sad when we last talked"],
26-
["id" => 3, "content"=> "She made him angry"],
24+
["id" => 1, "content" => "He seems very happy"],
25+
["id" => 2, "content" => "He was very sad when we last talked"],
26+
["id" => 3, "content" => "She made him angry"],
2727
];
2828

2929
$collection->add(
@@ -37,5 +37,3 @@
3737
);
3838

3939
dd($queryResponse->documents[0], $queryResponse->distances[0]);
40-
41-
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Document Chunking and Embedding Example
2+
3+
This example demonstrates how to chunk a document, generate embeddings, and store them in Chroma Cloud for semantic search and retrieval.
4+
5+
## Overview
6+
7+
The example performs the following operations:
8+
9+
1. **Ingestion Mode**: Chunks a document (`document.txt`) into smaller pieces, generates embeddings using Jina AI, and stores them in Chroma Cloud
10+
2. **Query Mode**: Performs semantic search on the stored documents using natural language queries
11+
12+
## Prerequisites
13+
14+
- PHP 8.1 or higher
15+
- Chroma Cloud account with API key
16+
- Jina AI API key (for embeddings)
17+
- Composer dependencies installed (`composer install`)
18+
19+
## Setup
20+
21+
1. Set your API keys as environment variables:
22+
23+
```bash
24+
export CHROMA_API_KEY="your-chroma-cloud-api-key"
25+
export JINA_API_KEY="your-jina-api-key"
26+
```
27+
28+
Or pass them via CLI arguments (see Usage below).
29+
30+
## Usage
31+
32+
### Ingest Mode
33+
34+
Chunk and store the document to Chroma Cloud:
35+
36+
```bash
37+
php index.php -mode ingest
38+
```
39+
40+
With custom options:
41+
42+
```bash
43+
php index.php -mode ingest \
44+
--api-key "your-chroma-api-key" \
45+
--jina-key "your-jina-api-key" \
46+
--tenant "my-tenant" \
47+
--database "my-database"
48+
```
49+
50+
### Query Mode
51+
52+
Search the stored documents:
53+
54+
```bash
55+
php index.php -mode query --query "What happened at the Dartmouth Workshop?"
56+
```
57+
58+
With custom options:
59+
60+
```bash
61+
php index.php -mode query \
62+
--query "Who proposed the Turing Test?" \
63+
--api-key "your-chroma-api-key" \
64+
--jina-key "your-jina-api-key" \
65+
--tenant "my-tenant" \
66+
--database "my-database"
67+
```
68+
69+
## CLI Arguments
70+
71+
| Argument | Description | Default | Required |
72+
|----------|-------------|---------|----------|
73+
| `-mode` | Operation mode: `ingest` or `query` | - | Yes |
74+
| `--query` | Query text for search (query mode only) | "Which event marked the birth of symbolic AI?" | No |
75+
| `--api-key` | Chroma Cloud API key | `CHROMA_API_KEY` env var | Yes |
76+
| `--jina-key` | Jina AI API key for embeddings | `JINA_API_KEY` env var | Yes |
77+
| `--tenant` | Chroma Cloud tenant name | `default_tenant` | No |
78+
| `--database` | Chroma Cloud database name | `default_database` | No |
79+
| `--collection-name` | Collection name to use | `history_of_ai` | No |
80+
81+
## Example Queries
82+
83+
Try these example queries to test the semantic search:
84+
85+
```bash
86+
# Historical events
87+
php index.php -mode query --query "What happened at the Dartmouth Workshop?"
88+
89+
# People and contributions
90+
php index.php -mode query --query "Who proposed the Turing Test?"
91+
92+
# Technical breakthroughs
93+
php index.php -mode query --query "What was the significance of AlexNet in 2012?"
94+
95+
# Concepts and explanations
96+
php index.php -mode query --query "How do Large Language Models and Generative AI work?"
97+
98+
# Historical figures
99+
php index.php -mode query --query "Who is considered the first computer programmer?"
100+
```
101+
102+
## How It Works
103+
104+
### Document Chunking
105+
106+
The document is chunked based on:
107+
- **CHAPTER markers**: New chapters create new chunks
108+
- **PAGE markers**: New pages create new chunks
109+
- **Text accumulation**: Text between markers is accumulated into chunks
110+
111+
Each chunk includes:
112+
- Unique ID
113+
- Document text
114+
- Metadata (chapter and page information)
115+
116+
### Embedding Generation
117+
118+
- Uses Jina AI's embedding function to convert text chunks into vector embeddings
119+
- Embeddings are generated in batch for efficiency
120+
- All chunks are embedded before storage
121+
122+
### Storage
123+
124+
- Chunks are stored in a Chroma Cloud collection
125+
- The collection is recreated on each ingestion (previous data is deleted)
126+
- Each chunk maintains its metadata for filtering and context
127+
128+
### Querying
129+
130+
- Natural language queries are converted to embeddings using the same Jina AI function
131+
- Vector similarity search finds the most relevant chunks
132+
- Results include distance scores, documents, and metadata
133+
134+
## Output
135+
136+
### Ingest Mode
137+
138+
```
139+
--- Chroma Cloud Example: ingest Mode ---
140+
Tenant: default_tenant, Database: default_database
141+
Connected to Chroma Cloud version: 0.1.0
142+
Starting Ingestion...
143+
Parsed 9 chunks from document.
144+
Embedding and adding 9 items...
145+
Ingestion Complete!
146+
```
147+
148+
### Query Mode
149+
150+
```
151+
--- Chroma Cloud Example: query Mode ---
152+
Tenant: default_tenant, Database: default_database
153+
Connected to Chroma Cloud version: 0.1.0
154+
Querying: "What happened at the Dartmouth Workshop?"
155+
156+
--- Results ---
157+
[0] (Distance: 0.123)
158+
Location: CHAPTER 1: The Dawn of Thinking Machines, PAGE 3
159+
Content: The 1956 Dartmouth Workshop is widely considered the founding event of AI as a field. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon brought together...
160+
---------------------------
161+
```
162+
163+
## Customization
164+
165+
### Using a Different Document
166+
167+
Replace `document.txt` with your own document. The chunking logic will automatically process it based on CHAPTER and PAGE markers.
168+
169+
### Using a Different Embedding Function
170+
171+
Modify `index.php` to use a different embedding function:
172+
173+
```php
174+
use Codewithkyrian\ChromaDB\Embeddings\OpenAIEmbeddingFunction;
175+
176+
$ef = new OpenAIEmbeddingFunction($config['openai_key']);
177+
```
178+
179+
### Custom Chunking Strategy
180+
181+
Modify the `chunkDocument()` function to implement your own chunking logic (e.g., by sentence, by paragraph, fixed-size chunks, etc.).
182+
183+
## Troubleshooting
184+
185+
**Error: Chroma Cloud API Key is required**
186+
- Set `CHROMA_API_KEY` environment variable or use `--api-key` argument
187+
188+
**Error: Jina API Key is required**
189+
- Set `JINA_API_KEY` environment variable or use `--jina-key` argument
190+
191+
**Error: Collection not found**
192+
- Run ingestion mode first to create and populate the collection
193+
194+
**No results returned**
195+
- Ensure the collection was successfully ingested
196+
- Try different query phrasings
197+
- Check that the query is related to the document content
198+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
THE EVOLUTION OF ARTIFICIAL INTELLIGENCE
2+
3+
CHAPTER 1: The Dawn of Thinking Machines
4+
PAGE 1
5+
The quest to create machines that can think is as old as storytelling itself. From the automatons of Greek mythology to the Golems of Jewish folklore, humanity has always dreamed of breathing life into the inanimate. However, it wasn't until the 20th century that the mathematical foundations for Artificial Intelligence were laid. Ada Lovelace, often considered the first computer programmer, speculated that the Analytical Engine might act upon other things besides numbers.
6+
PAGE 2
7+
In 1950, Alan Turing proposed the famous "Turing Test" as a measure of machine intelligence. He asked, "Can machines think?" and suggested that if a machine could converse with a human without being distinguished from another human, it could be said to "think". This period marked the birth of symbolic AI, where researchers believed that intelligence could be reduced to symbol manipulation.
8+
PAGE 3
9+
The 1956 Dartmouth Workshop is widely considered the founding event of AI as a field. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon brought together researchers to discuss "thinking machines". Optimism was high; Minsky famously predicted that within a generation, the problem of creating 'artificial intelligence' would be substantially solved.
10+
11+
CHAPTER 2: Deep Learning and Neural Networks
12+
PAGE 1
13+
While early AI focused on logic and rules, another approach was brewing: connectionism. Inspired by the human brain, artificial neural networks aimed to learn from data rather than following hard-coded instructions. The Perceptron, developed by Frank Rosenblatt in 1958, was an early model of a single neuron, capable of simple binary classification.
14+
PAGE 2
15+
However, neural networks faced a "winter" in the 1970s and 80s due to computational limitations and the inability to train deep networks. It wasn't until the mid-2000s, with the advent of powerful GPUs and big data, that "Deep Learning" re-emerged. Researchers like Geoffrey Hinton showed that multi-layered networks could learn complex patterns, leading to breakthroughs in image and speech recognition.
16+
PAGE 3
17+
The turning point came in 2012 with AlexNet, a deep convolutional neural network that dominated the ImageNet competition. This victory demonstrated the undeniable power of deep learning, sparking an explosion of investment and research. Suddenly, computers could see, hear, and translate languages with near-human accuracy.
18+
19+
CHAPTER 3: The Generative Era
20+
PAGE 1
21+
In the 2020s, AI shifted from merely analyzing data to creating it. Generative AI, powered by architectures like the Transformer (introduced by Google in 2017), enabled models to understand and generate human-like text. The concept of "Attention" allowed these models to weigh the importance of different words in a sentence, capturing context like never before.
22+
PAGE 2
23+
Large Language Models (LLMs) like GPT-3 and GPT-4 demonstrated emergent abilities. They could write code, compose poetry, solve math problems, and even reason through complex tasks. This era also saw the rise of diffusion models in image generation, allowing users to create stunning visual art from simple text prompts.
24+
PAGE 3
25+
As we stand on the brink of Artificial General Intelligence (AGI), the focus shifts to alignment and safety. Ensuring that these powerful systems act in accordance with human values is the defining challenge of our time. The journey from the Dartmouth Workshop to ChatGPT has been long, but in many ways, it is just beginning.

0 commit comments

Comments
 (0)