A Laravel package for building intelligent recommendation systems using OpenAI embeddings. Perfect for creating personalized content recommendations, job matching, product suggestions, and similar features where you need to find relevant matches based on user profiles or content similarity.
- Batch embedding processing using OpenAI's batch API
- Separate database connection support for vector operations
- Automatic vector extension creation for PostgreSQL
- Efficient batch processing with configurable chunk sizes
- Dual Contract System: Separate contracts for embedding generation and searchable models
- Smart Model Separation: Models can be either embedding sources or searchable targets
- Install the package via Composer:
composer require thesubhendu/embedvector-laravel
- Publish the configuration and migrations:
php artisan vendor:publish --provider="Subhendu\EmbedVector\EmbedVectorServiceProvider"
- Configure your environment variables:
OPENAI_API_KEY=your_openai_api_key_here
Database Requirements: This package requires PostgreSQL with the pgvector extension for vector operations.
Optional: If you want to use a separate PostgreSQL database connection other than your application database for vector operations, you can set the EMBEDVECTOR_DB_CONNECTION
environment variable.
EMBEDVECTOR_DB_CONNECTION=pgsql
- Run the migrations
php artisan migrate
This package uses two distinct contracts to separate concerns based on the direction of matching:
EmbeddableContract
- For models that generate embeddings (e.g., Customer/Candidate profiles)EmbeddingSearchableContract
- For models that can be found using embeddings (e.g., Jobs)
If system is designed to find matching jobs for customers/candidates, not the other way around:
- Customer/Candidate implements
EmbeddableContract
→ generates embeddings from their profile, skills, preferences - Job implements
EmbeddingSearchableContract
→ can be found/recommended based on candidate embeddings - Flow: Customer embeddings are used to find relevant Jobs that match their profile
For Bidirectional Matching: If you want both ways (finding jobs for candidates AND finding candidates for jobs), then both models need to implement EmbeddingSearchableContract
.
use Subhendu\EmbedVector\Services\EmbeddingService;
$embeddingService = app(EmbeddingService::class);
$embedding = $embeddingService->createEmbedding('Your text here');
use Illuminate\Database\Eloquent\Model;
use Subhendu\EmbedVector\Contracts\EmbeddableContract;
use Subhendu\EmbedVector\Traits\EmbeddableTrait;
class Customer extends Model implements EmbeddableContract
{
use EmbeddableTrait;
public function toEmbeddingText(): string
{
return $this->name . ' ' . $this->department . ' ' . $this->skills;
}
}
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Factories\HasFactory;
use Subhendu\EmbedVector\Contracts\EmbeddingSearchableContract;
use Subhendu\EmbedVector\Traits\EmbeddingSearchableTrait;
class Job extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait;
use HasFactory;
public function toEmbeddingText(): string
{
return $this->title . ' ' . $this->description . ' ' . $this->requirements;
}
}
Note: EmbeddingSearchableContract
extends EmbeddableContract
, and EmbeddingSearchableTrait
automatically includes EmbeddableTrait
functionality, so you only need to use one trait.
// Find jobs that match a customer's profile
$customer = Customer::find(1);
$matchingJobs = $customer->matchingResults(Job::class, 10);
foreach ($matchingJobs as $job) {
echo "Job: {$job->title} - Match: {$job->match_percent}%";
echo "Distance: {$job->distance}";
}
Note: The matchingResults()
method automatically uses getOrCreateEmbedding()
internally, which means:
- If no embedding exists for the source model, it will be created
- If an embedding exists but needs sync (
embedding_sync_required = true
), it will be updated - This ensures you always get accurate similarity results
You can add query filters to narrow down the search results before embedding similarity is calculated:
// Find only active jobs in specific locations
$customer = Customer::find(1);
$matchingJobs = $customer->matchingResults(
targetModelClass: Job::class,
topK: 10,
queryFilter: function ($query) {
$query->where('status', 'active')
->whereIn('location', ['New York', 'San Francisco'])
->where('salary', '>=', 80000);
}
);
targetModelClass
(string): The class name of the model you want to find matches fortopK
(int, default: 5): Maximum number of results to returnqueryFilter
(Closure, optional): Custom query constraints to apply before similarity matching
Each returned model includes additional properties:
match_percent
(float): Similarity percentage (0-100, higher is better)distance
(float): Vector distance (lower is better for similarity)
The package publishes a configuration file to config/embedvector.php
with the following options:
return [
'openai_api_key' => env('OPENAI_API_KEY', ''),
'embedding_model' => env('EMBEDVECTOR_MODEL', 'text-embedding-3-small'),
'distance_metric' => env('EMBEDVECTOR_DISTANCE', 'cosine'), // cosine | l2
'search_strategy' => env('EMBEDVECTOR_SEARCH_STRATEGY', 'auto'), // auto | optimized | cross_connection
'lot_size' => env('EMBEDVECTOR_LOT_SIZE', 50000),
'chunk_size' => env('EMBEDVECTOR_CHUNK_SIZE', 500),
'directories' => [
'input' => 'embeddings/input',
'output' => 'embeddings/output',
],
'database_connection' => env('EMBEDVECTOR_DB_CONNECTION', 'pgsql'),
'model_fields_to_check' => [
// Configure fields to monitor for automatic sync
// 'App\Models\Job' => ['title', 'description', 'requirements'],
],
];
openai_api_key
: Your OpenAI API key (required in production)embedding_model
: OpenAI embedding model to use (text-embedding-3-small, text-embedding-3-large, etc.)distance_metric
: Vector similarity calculation methodcosine
: Better for semantic similarity (recommended)l2
: Euclidean distance for geometric similarity
search_strategy
: How to perform similarity searchesauto
: Automatically choose the best strategy (recommended)optimized
: Use JOIN-based queries (same database only)cross_connection
: Two-step approach (works across different databases)
lot_size
: Maximum items per OpenAI batch (up to 50,000)chunk_size
: Items processed per chunk during batch generationdatabase_connection
: PostgreSQL connection for vector operationsmodel_fields_to_check
: Configure fields to monitor for automatic sync withFireSyncEmbeddingTrait
For processing large datasets efficiently, this package provides batch processing capabilities using OpenAI's batch API, which is more cost-effective for processing many embeddings at once.
php artisan embedding:gen {model} {--type=sync|init} {--force}
- Generate batch embeddings for a specific modelphp artisan embedding:proc {--batch-id=} {--all}
- Process completed batch results
{model}
- The model class name to generate embeddings for (e.g.App\\Models\\Job
)--type=sync
- Processing type (default: sync)--force
- Force overwrite existing files
--batch-id=
- Process a specific batch by ID--all
- Process all completed batches- No options - Check and process batches that are ready (default behavior)
# Generate embeddings for User model (init = first time, sync = update existing)
php artisan embedding:gen "App\\Models\\User" --type=init
# Generate embeddings for sync (only models that need updates)
php artisan embedding:gen "App\\Models\\Job" --type=sync
# Check and process ready batches (default)
php artisan embedding:proc
# Process all completed batches
php artisan embedding:proc --all
# Process specific batch
php artisan embedding:proc --batch-id=batch_abc123
// Product model (searchable)
class Product extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait;
public function toEmbeddingText(): string
{
return $this->name . ' ' . $this->description . ' ' . $this->category . ' ' . $this->tags;
}
}
// User model (generates embeddings from purchase history)
class User extends Model implements EmbeddableContract
{
use EmbeddableTrait;
public function toEmbeddingText(): string
{
$purchaseHistory = $this->orders()
->with('products')
->get()
->flatMap->products
->pluck('name')
->implode(' ');
return $this->preferences . ' ' . $purchaseHistory;
}
}
// Find recommended products for a user
$user = User::find(1);
$recommendations = $user->matchingResults(
targetModelClass: Product::class,
topK: 20,
queryFilter: function ($query) {
$query->where('in_stock', true)
->where('price', '<=', 500)
->whereNotIn('id', auth()->user()->purchased_product_ids);
}
);
// Find jobs for a candidate with filters
$candidate = Candidate::find(1);
$matchingJobs = $candidate->matchingResults(
targetModelClass: Job::class,
topK: 15,
queryFilter: function ($query) use ($candidate) {
$query->where('status', 'open')
->where('remote_allowed', $candidate->prefers_remote)
->whereIn('experience_level', $candidate->acceptable_levels)
->where('salary_min', '>=', $candidate->min_salary);
}
);
foreach ($matchingJobs as $job) {
echo "Match: {$job->match_percent}% - {$job->title} at {$job->company}";
}
// Article model
class Article extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait;
public function toEmbeddingText(): string
{
return $this->title . ' ' . $this->summary . ' ' . $this->tags . ' ' . $this->category;
}
}
// User reading history model
class UserProfile extends Model implements EmbeddableContract
{
use EmbeddableTrait;
public function toEmbeddingText(): string
{
$readingHistory = $this->user->readArticles()
->selectRaw('GROUP_CONCAT(title, " ", summary) as content')
->value('content');
return $this->interests . ' ' . $readingHistory;
}
}
// Get personalized article recommendations
$profile = UserProfile::where('user_id', auth()->id())->first();
$recommendations = $profile->matchingResults(
targetModelClass: Article::class,
topK: 10,
queryFilter: function ($query) use ($profile) {
$query->where('published', true)
->where('created_at', '>=', now()->subDays(7))
->whereNotIn('id', $profile->user->read_article_ids);
}
);
$job = Job::find(1);
// Check if an embedding exists without creating one
$embedding = $job->getEmbedding();
if ($embedding) {
echo "Embedding exists: " . ($embedding->embedding_sync_required ? "Needs sync" : "Up to date");
} else {
echo "No embedding found";
}
// Get or create embedding (will create if missing or update if sync required)
$embedding = $job->getOrCreateEmbedding();
echo "Embedding ready with match percentage calculation";
// Force create a fresh embedding (useful for testing or manual refresh)
$freshEmbedding = $job->createFreshEmbedding();
// Queue for syncing (mark for batch update later)
$job->queueForSyncing();
// 1. Mark multiple models for syncing
$jobs = Job::where('updated_at', '>', now()->subDays(1))->get();
foreach ($jobs as $job) {
$job->queueForSyncing(); // Queue each job for sync
}
// 2. Process all queued embeddings in batch
php artisan embedding:gen "App\\Models\\Job" --type=sync
// 3. Process the completed batch
php artisan embedding:proc --all
class JobController extends Controller
{
public function update(Request $request, Job $job)
{
$job->update($request->validated());
// Only queue for syncing if embedding-relevant fields changed
if ($job->wasChanged(['title', 'description', 'requirements'])) {
$job->queueForSyncing();
}
return response()->json($job);
}
}
// Real-time embedding (immediate, good for single updates)
$job = Job::create($data);
$embedding = $job->getOrCreateEmbedding(); // Creates immediately
// Batch embedding (efficient for bulk updates)
$jobs = Job::factory()->count(100)->create();
foreach ($jobs as $job) {
$job->queueForSyncing(); // Mark for batch processing
}
// Then run: php artisan embedding:gen "App\\Models\\Job" --type=sync
// Scenario 1: New model creation
$job = Job::create($data);
// Option A: Create embedding immediately
$embedding = $job->getOrCreateEmbedding();
// Option B: Queue for batch processing (more efficient)
$job->queueForSyncing();
// Scenario 2: Model updates
$job->update(['title' => 'Updated Title']);
// Option A: Update embedding immediately
$job->createFreshEmbedding();
// Option B: Queue for batch processing (recommended)
$job->queueForSyncing();
// Scenario 3: Checking embedding status
$embedding = $job->getEmbedding();
if (!$embedding) {
echo "No embedding exists";
} elseif ($embedding->embedding_sync_required) {
echo "Embedding needs update";
} else {
echo "Embedding is up to date";
}
// Scenario 4: Bulk operations
$jobs = Job::where('department', 'Engineering')->get();
foreach ($jobs as $job) {
$job->queueForSyncing(); // Queue all for batch processing
}
// Process in batch: php artisan embedding:gen "App\\Models\\Job" --type=sync
public function toEmbeddingText(): string
{
// ✅ Good: Concise, relevant information
return trim($this->title . ' ' . $this->description . ' ' . $this->tags);
// ❌ Avoid: Too much noise or irrelevant data
// return $this->created_at . ' ' . $this->id . ' ' . $this->long_legal_text;
}
// ✅ Good: Filter before similarity calculation
$matches = $user->matchingResults(
Product::class,
10,
fn($q) => $q->where('available', true)->where('price', '<=', $budget)
);
// ❌ Less efficient: Filtering after embedding calculation
$allMatches = $user->matchingResults(Product::class, 100);
$filtered = $allMatches->where('available', true);
Understanding when to use each embedding method:
$job = Job::find(1);
// ✅ Use getEmbedding() when you just want to check if embedding exists
$embedding = $job->getEmbedding();
if ($embedding && !$embedding->embedding_sync_required) {
// Use existing embedding
}
// ✅ Use getOrCreateEmbedding() for similarity matching (recommended)
$matchingJobs = $customer->matchingResults(Job::class); // Uses getOrCreateEmbedding internally
// ✅ Use createFreshEmbedding() when you want to force regeneration
$job->update(['title' => 'New Title']);
$freshEmbedding = $job->createFreshEmbedding(); // Immediate update
// ✅ Use queueForSyncing() for deferred batch processing (most efficient)
$job->update(['title' => 'New Title']);
$job->queueForSyncing(); // Mark for later batch processing
// Method 1: Using queueForSyncing() (recommended)
class Job extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait;
protected static function booted()
{
static::updated(function ($job) {
if ($job->isDirty(['title', 'description', 'requirements'])) {
$job->queueForSyncing(); // Simpler and cleaner approach
}
});
}
}
// Method 2: Direct embedding update (legacy approach)
class Job extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait;
protected static function booted()
{
static::updated(function ($job) {
if ($job->isDirty(['title', 'description', 'requirements'])) {
$job->embedding()->update(['embedding_sync_required' => true]);
}
});
}
}
For automatic embedding sync management, use the FireSyncEmbeddingTrait
:
use Subhendu\EmbedVector\Traits\FireSyncEmbeddingTrait;
class Job extends Model implements EmbeddingSearchableContract
{
use EmbeddingSearchableTrait, FireSyncEmbeddingTrait;
// No need for manual booted() method - trait handles it automatically
}
Configure which fields to monitor in your config/embedvector.php
:
return [
// ... other config options
'model_fields_to_check' => [
'App\Models\Job' => ['title', 'description', 'requirements'],
'App\Models\Product' => ['name', 'description', 'category'],
'App\Models\User' => ['name', 'bio', 'skills'],
],
];
How it works:
- The trait automatically monitors specified fields for changes
- When any monitored field changes, it marks the embedding for re-sync
- Only triggers when fields actually change (compares old vs new values)
- Respects the configuration mapping for each model class
-
"No embedding found in response"
- Check your OpenAI API key is valid
- Verify the embedding model exists
- Ensure your
toEmbeddingText()
returns non-empty strings
-
"Model class must implement EmbeddingSearchableContract"
- Target models must implement
EmbeddingSearchableContract
- Source models only need
EmbeddableContract
- Target models must implement
-
Poor matching results
- Review your
toEmbeddingText()
method - it should contain relevant, semantic information - Consider using
cosine
distance for semantic similarity - Try different embedding models (text-embedding-3-large for better quality)
- Review your
-
Performance issues
- Use batch processing for large datasets
- Consider using
optimized
search strategy for same-database scenarios - Add appropriate database indexes
-
Cross-connection relationship limitations
- The
embedding()
relationship only works when both models use the same database connection - For cross-connection setups (e.g., Jobs in MySQL, embeddings in PostgreSQL), use
getEmbedding()
orgetOrCreateEmbedding()
methods instead of the relationship - Direct relationship access (
$model->embedding
) will returnnull
in cross-connection scenarios
- The
-
Embedding method confusion
- Use
getEmbedding()
when you only want to check if an embedding exists (returns null if not found) - Use
getOrCreateEmbedding()
when you need an embedding for similarity matching (creates/updates as needed) - Use
queueForSyncing()
to defer embedding updates for batch processing (most efficient for bulk updates)
- Use
-- Add indexes for better performance
CREATE INDEX IF NOT EXISTS embeddings_model_type_idx ON embeddings (model_type);
CREATE INDEX IF NOT EXISTS embeddings_sync_required_idx ON embeddings (embedding_sync_required);
Add these to your .env
file:
# Required
OPENAI_API_KEY=your_openai_api_key_here
# Optional - Customize behavior
EMBEDVECTOR_MODEL=text-embedding-3-small
EMBEDVECTOR_DISTANCE=cosine
EMBEDVECTOR_SEARCH_STRATEGY=auto
EMBEDVECTOR_LOT_SIZE=50000
EMBEDVECTOR_CHUNK_SIZE=500
EMBEDVECTOR_DB_CONNECTION=pgsql
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Source Model │ │ Target Model │ │ Embeddings │
│ (EmbeddableContract) │ (EmbeddingSearchableContract) │ │ Table │
│ │ │ │ │ │
│ • Customer │───▶│ • Job │◀──│ • Vector data │
│ • User Profile │ │ • Product │ │ • Similarity │
│ • Candidate │ │ • Article │ │ calculations │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ toEmbeddingText()│ │ toEmbeddingText()│ │ PostgreSQL │
│ • Generate text │ │ • Generate text │ │ with pgvector │
│ for embedding │ │ for embedding │ │ extension │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Find models similar to the current model.
Parameters:
$targetModelClass
: Fully qualified class name of the target model$topK
: Maximum number of results (default: 5)$queryFilter
: Optional closure to filter results before similarity calculation
Returns: Collection of models with match_percent
and distance
properties
Get the existing embedding for the current model without creating a new one.
Returns: Embedding model instance or null if no embedding exists
Get the existing embedding or create a new one if none exists. Also handles updating embeddings when embedding_sync_required
is true.
Returns: Embedding model instance
Mark the model's embedding for re-generation on the next sync. This is useful when you want to defer embedding updates until a batch process runs.
Returns: void
Force create a new embedding for the model, bypassing any existing embedding.
Returns: Newly created Embedding model instance
Eloquent relationship to the embedding record.
Returns: MorphOne relationship
Get the base query for models to be embedded during initial processing.
Returns: Eloquent Builder instance
Get the query for models that need re-embedding (sync process).
Returns: Eloquent Builder instance
Get the database connection name for the model.
Returns: Database connection name or null for default
The package includes comprehensive tests. Run them with:
# Run all tests
vendor/bin/pest
# Run with coverage (requires Xdebug)
vendor/bin/pest --coverage
# Run static analysis
vendor/bin/phpstan analyse
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
vendor/bin/pest
) - Run static analysis (
vendor/bin/phpstan analyse
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
The MIT License (MIT). Please see License File for more information.
- Subhendu Bhatta
- Built with Laravel
- Powered by OpenAI Embeddings
- Uses pgvector for PostgreSQL vector operations