Skip to content

Commit 8ad5bdc

Browse files
authored
chore(example): consolidate example for S3 (#491)
1 parent c7bcba6 commit 8ad5bdc

File tree

4 files changed

+15
-47
lines changed

4 files changed

+15
-47
lines changed

examples/amazon_s3_text_embedding/README.md

Lines changed: 8 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,41 +8,8 @@ Before running the example, you need to:
88

99
1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
1010

11-
2. Prepare for Amazon S3:
12-
13-
- **Create an Amazon S3 bucket:**
14-
- Go to the [AWS S3 Console](https://s3.console.aws.amazon.com/s3/home) and click **Create bucket**. Give it a unique name and choose a region.
15-
- Or, use the AWS CLI:
16-
```sh
17-
aws s3 mb s3://your-s3-bucket-name
18-
```
19-
20-
- **Upload your files to the bucket:**
21-
- In the AWS Console, click your bucket, then click **Upload** and add your `.md`, `.txt`, `.docx`, or other files.
22-
- Or, use the AWS CLI:
23-
```sh
24-
aws s3 cp localfile.txt s3://your-s3-bucket-name/
25-
aws s3 cp your-folder/ s3://your-s3-bucket-name/ --recursive
26-
```
27-
28-
- **Set up AWS credentials:**
29-
- The easiest way is to run:
30-
```sh
31-
aws configure
32-
```
33-
Enter your AWS Access Key ID, Secret Access Key, region (e.g., `us-east-1`), and output format (`json`).
34-
- This creates a credentials file at `~/.aws/credentials` and config at `~/.aws/config`.
35-
- Alternatively, you can set environment variables:
36-
```sh
37-
export AWS_ACCESS_KEY_ID=your-access-key-id
38-
export AWS_SECRET_ACCESS_KEY=your-secret-access-key
39-
export AWS_DEFAULT_REGION=us-east-1
40-
```
41-
- If running on AWS EC2 or Lambda, you can use an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) with S3 read permissions.
42-
43-
- **(Optional) Specify a prefix** to restrict to a subfolder in the bucket by setting `AMAZON_S3_PREFIX` in your `.env`.
44-
45-
See [AWS S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) for more details.
11+
2. Prepare for Amazon S3.
12+
See [Setup for AWS S3](https://cocoindex.io/docs/ops/sources#setup-for-amazon-s3) for more details.
4613

4714
3. Create a `.env` file with your Amazon S3 bucket name and (optionally) prefix.
4815
Start from copying the `.env.example`, and then edit it to fill in your bucket name and prefix.
@@ -59,27 +26,27 @@ Before running the example, you need to:
5926

6027
# Amazon S3 Configuration
6128
AMAZON_S3_BUCKET_NAME=your-bucket-name
62-
AMAZON_S3_PREFIX=optional/prefix/path
29+
AMAZON_S3-SQS_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/123456789/S3ChangeNotifications
6330
```
6431

6532
## Run
6633

6734
Install dependencies:
6835

6936
```sh
70-
uv pip install -r requirements.txt
37+
pip install -e .
7138
```
7239

7340
Setup:
7441

7542
```sh
76-
uv run main.py cocoindex setup
43+
python main.py cocoindex setup
7744
```
7845

7946
Run:
8047

8148
```sh
82-
uv run main.py
49+
python main.py
8350
```
8451

8552
During running, it will keep observing changes in the Amazon S3 bucket and update the index automatically.
@@ -92,13 +59,13 @@ CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute vi
9259
Run CocoInsight to understand your RAG data pipeline:
9360

9461
```sh
95-
uv run main.py cocoindex server -ci
62+
python main.py cocoindex server -ci
9663
```
9764

9865
You can also add a `-L` flag to make the server keep updating the index to reflect source changes at the same time:
9966

10067
```sh
101-
uv run main.py cocoindex server -ci -L
68+
python main.py cocoindex server -ci -L
10269
```
10370

10471
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).

examples/amazon_s3_text_embedding/main.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from dotenv import load_dotenv
22

33
import cocoindex
4-
import datetime
54
import os
65

76
@cocoindex.flow_def(name="AmazonS3TextEmbedding")
@@ -19,8 +18,7 @@ def amazon_s3_text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scop
1918
prefix=prefix,
2019
included_patterns=["*.md", "*.txt", "*.docx"],
2120
binary=False,
22-
sqs_queue_url=sqs_queue_url),
23-
refresh_interval=datetime.timedelta(minutes=1))
21+
sqs_queue_url=sqs_queue_url))
2422

2523
doc_embeddings = data_scope.add_collector()
2624

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[project]
2+
name = "amazon-s3-text-embedding"
3+
version = "0.1.0"
4+
description = "Simple example for cocoindex: build embedding index based on Amazon S3 files."
5+
requires-python = ">=3.11"
6+
dependencies = ["cocoindex>=0.1.35", "python-dotenv>=1.0.1"]

examples/amazon_s3_text_embedding/requirements.txt

Lines changed: 0 additions & 3 deletions
This file was deleted.

0 commit comments

Comments
 (0)