Skip to content

Commit 42777ab

Browse files
authored
Merge pull request #37 from werther41/bugfix-outdated-model
update embedding model
2 parents b53fc30 + 088b71c commit 42777ab

File tree

4 files changed

+20
-10
lines changed

4 files changed

+20
-10
lines changed

documents/QUICK_REFERENCE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
# Run tests
1616
./scripts/test-all.sh
1717

18-
# Trigger news ingestion
19-
curl http://localhost:3000/api/cron/retrieve-news
18+
# Trigger news ingestion (RSS fetch + text embedding generation)
19+
curl -X POST -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/api/cron/retrieve-news
2020

2121
# Check topics
2222
curl http://localhost:3000/api/topics

documents/api-docs.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -606,13 +606,20 @@ Returns information about the most relevant news article without generating a fa
606606

607607
### 10. News Ingestion Cron Job
608608

609-
**POST** `/api/cron/retrieve-news`
609+
**POST** `/api/cron/retrieve-news`
610+
**GET** `/api/cron/retrieve-news` (same behavior; for manual/testing use)
610611

611-
Fetches and stores news articles from RSS feeds. Requires authorization header.
612+
Fetches articles from configured RSS feeds (last 48 hours), generates text embeddings (Gemini `gemini-embedding-001`), and stores them in the database. Use this to manually trigger the same pipeline that runs on schedule.
612613

613614
**Headers:**
614615

615-
- `Authorization: Bearer {CRON_SECRET}`
616+
- `Authorization: Bearer {CRON_SECRET}` (required for both POST and GET unless `x-vercel-cron` is present)
617+
618+
**Example (manual retrieval):**
619+
620+
```bash
621+
curl -X POST -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/api/cron/retrieve-news
622+
```
616623

617624
**Response:**
618625

@@ -775,7 +782,7 @@ Uses Turso's native vector search with cosine similarity to find the most releva
775782

776783
### AI Integration
777784

778-
- **Embedding Model**: Google Gemini `text-embedding-004` (768 dimensions)
785+
- **Embedding Model**: Google Gemini `gemini-embedding-001` (768 dimensions)
779786
- **LLM**: Google Gemini `gemini-2.0-flash-lite`
780787
- **Streaming**: Real-time fact generation with word-by-word display
781788

lib/embeddings.ts

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { GoogleGenerativeAI } from "@google/generative-ai"
44
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!)
55

66
/**
7-
* Generate embedding for text using Gemini text-embedding-004 model
7+
* Generate embedding for text using Gemini gemini-embedding-001 model
88
* @param text - The text to generate embedding for
99
* @returns Promise<number[]> - Array of float32 values representing the embedding
1010
*/
@@ -14,8 +14,11 @@ export async function generateEmbedding(text: string): Promise<number[]> {
1414
throw new Error("GOOGLE_API_KEY environment variable not set")
1515
}
1616

17-
const model = genAI.getGenerativeModel({ model: "text-embedding-004" })
18-
const result = await model.embedContent(text)
17+
const model = genAI.getGenerativeModel({ model: "gemini-embedding-001" })
18+
const result = await model.embedContent({
19+
content: { parts: [{ text }] },
20+
output_dimensionality: 768,
21+
} as unknown as Parameters<typeof model.embedContent>[0])
1922

2023
return result.embedding.values
2124
} catch (error) {

tsconfig.tsbuildinfo

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)