Skip to content

Commit e82bd7c

Browse files
itsderek23andreibondarev
authored andcommitted
WIP pgvector support
1 parent 9f605d3 commit e82bd7c

File tree

10 files changed

+346
-10
lines changed

10 files changed

+346
-10
lines changed

README.md

Lines changed: 84 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
💎🔗 Langchain.rb for Rails
22
---
3-
⚡ Building applications with LLMs through composability ⚡
4-
5-
👨‍💻👩‍💻 CURRENTLY SEEKING PEOPLE TO FORM THE CORE GROUP OF MAINTAINERS WITH
3+
The fastest way to sprinkle AI ✨ on top of your Rails app. Add OpenAI-powered question-and-answering in minutes.
64

75
![Tests status](https://github.com/andreibondarev/langchainrb_rails/actions/workflows/ci.yml/badge.svg?branch=main)
86
[![Gem Version](https://badge.fury.io/rb/langchainrb_rails.svg)](https://badge.fury.io/rb/langchainrb_rails)
97
[![Docs](http://img.shields.io/badge/yard-docs-blue.svg)](http://rubydoc.info/gems/langchainrb_rails)
108
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/andreibondarev/langchainrb_rails/blob/main/LICENSE.txt)
119
[![](https://dcbadge.vercel.app/api/server/WDARp7J2n8?compact=true&style=flat)](https://discord.gg/WDARp7J2n8)
1210

11+
## Dependencies
1312

14-
Langchain.rb is a library that's an abstraction layer on top many emergent AI, ML and other DS tools. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional software engineers.
13+
* Ruby 3.0+
14+
* Postgres 11+
1515

1616
## Table of Contents
1717

@@ -28,8 +28,88 @@ If bundler is not being used to manage dependencies, install the gem by executin
2828

2929
gem install langchainrb_rails
3030

31+
## Configuration w/PgVector (requires Postgres 11+)
32+
33+
1. Generate changes to support vectorsearch in your chosen ActiveRecord model
34+
35+
```
36+
rails generate langchainrb_rails:pg_vector --model=Product --llm=openai
37+
```
38+
39+
This adds required dependencies to your Gemfile, creates the initializer file `config/initializers/langchainrb_rails.rb`, database migrations to support vectorsearch, and adds the necessary code to the ActiveRecord model to enable vectorsearch.
40+
41+
2. Bundle && Migrate
42+
43+
```
44+
bundle && rails db:migrate
45+
```
46+
47+
3. Set the env var `OPENAI_API_KEY` to your OpenAI API key.
48+
49+
4. Generate embeddings for your model
50+
51+
```
52+
[YOUR MODEL].embed!
53+
```
54+
55+
This can take a while depending on the number of database records.
56+
57+
## Usage
58+
59+
### Question and Answering
60+
61+
```
62+
Product.ask("list the brands of shoes that are in stock")
63+
```
64+
65+
Returns a `String` with a natural language answer. The answer is assembled using the following steps:
66+
67+
1. Turn the `question` into an embedding using the selected LLM.
68+
2. Find records that most closely match the question using Postgres vector similarity search (#similarity_search).
69+
3. Create a prompt using the question and insert the records (via `#as_vector`) into the prompt as context.
70+
4. Generate a completion using the prompt and the selected LLM.
71+
72+
### Similarity Search
73+
74+
```
75+
Product.similarity_search("t-shirt")
76+
```
77+
78+
Returns ActiveRecord records that most closely match the `query` using Postgres vector similarity search.
79+
80+
## Customization
81+
82+
## Changing the vector representation of a record
83+
84+
By default, embeddings are generated by calling the following within your model:
85+
86+
```
87+
to_json(except: :embedding)
88+
```
89+
90+
You can override this by defining an `#as_vector` method in your model:
91+
92+
```
93+
def as_vector
94+
res = to_json(except: :embedding, :owner_id, :user_id, :category_id)
95+
res.merge({ "owner" => owner.name, "user" => user.name, "category" => category.name })
96+
end
97+
```
98+
99+
Re-generate embeddings after modifying this method:
100+
101+
```
102+
[YOUR MODEL].embed!
103+
```
104+
31105
## Rails Generators
32106

107+
### PgVector Generator
108+
109+
```
110+
rails generate langchainrb_rails:pg_vector --model=Product --llm=openai
111+
```
112+
33113
### Pinecone Generator - adds vectorsearch to your ActiveRecord model
34114
```
35115
rails generate langchainrb_rails:pinecone --model=Product --llm=openai

langchainrb_rails.gemspec

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ Gem::Specification.new do |spec|
3030
spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
3131
spec.require_paths = ["lib"]
3232

33-
spec.add_dependency "langchainrb", "~> 0.7.0"
33+
# TODO - add back ... loading locally
34+
#spec.add_dependency "langchainrb", "~> 0.7.0"
3435

3536
spec.add_development_dependency "pry-byebug", "~> 3.10.0"
3637
spec.add_development_dependency "yard", "~> 0.9.34"
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# frozen_string_literal: true
2+
3+
module Langchain::Vectorsearch
4+
class Pgvector < Base
5+
#
6+
# The PostgreSQL vector search adapter
7+
#
8+
# Gem requirements:
9+
# gem "pgvector", "~> 0.2"
10+
#
11+
# Usage:
12+
# pgvector = Langchain::Vectorsearch::Pgvector.new(llm:, model_name:)
13+
#
14+
15+
# The operators supported by the PostgreSQL vector search adapter
16+
OPERATORS = [
17+
"cosine",
18+
"euclidean",
19+
"inner_product"
20+
]
21+
DEFAULT_OPERATOR = "cosine"
22+
23+
attr_reader :db, :operator, :llm
24+
attr_accessor :model
25+
26+
# @param url [String] The URL of the PostgreSQL database
27+
# @param index_name [String] The name of the table to use for the index
28+
# @param llm [Object] The LLM client to use
29+
# @param namespace [String] The namespace to use for the index when inserting/querying
30+
def initialize(llm:)
31+
# If the line below is called, the generator fails as calls to
32+
# LangchainrbRails.config.vectorsearch will generate an exception.
33+
# These happen in the template files.
34+
# depends_on "neighbor"
35+
36+
@operator = DEFAULT_OPERATOR
37+
38+
super(llm: llm)
39+
end
40+
41+
# Add a list of texts to the index
42+
# @param texts [Array<String>] The texts to add to the index
43+
# @param ids [Array<String>] The ids to add to the index, in the same order as the texts
44+
# @return [Array<Integer>] The the ids of the added texts.
45+
def add_texts(texts:, ids:)
46+
embeddings = texts.map do |text|
47+
llm.embed(text: text).embedding
48+
end
49+
50+
# I believe the records returned by #find must be in the
51+
# same order as the embeddings. I _think_ this works for uuid ids but didn't test
52+
# deeply.
53+
# TODO - implement find_each so we don't load all records into memory
54+
model.find(ids).each.with_index do |record, i|
55+
record.update_column(:embedding, embeddings[i])
56+
end
57+
end
58+
59+
def update_texts(texts:, ids:)
60+
add_texts(texts: texts, ids: ids)
61+
end
62+
63+
# Invoke a rake task that will create an initializer (`config/initializers/langchain.rb`) file
64+
# and db/migrations/* files
65+
def create_default_schema
66+
Rake::Task["pgvector"].invoke
67+
end
68+
69+
# Destroy default schema
70+
def destroy_default_schema
71+
# Tell the user to rollback the migration
72+
end
73+
74+
# Search for similar texts in the index
75+
# @param query [String] The text to search for
76+
# @param k [Integer] The number of top results to return
77+
# @return [Array<Hash>] The results of the search
78+
# TODO - drop the named "query:" param so it is the same interface as #ask?
79+
def similarity_search(query:, k: 4)
80+
embedding = llm.embed(text: query).embedding
81+
82+
similarity_search_by_vector(
83+
embedding: embedding,
84+
k: k
85+
)
86+
end
87+
88+
# Search for similar texts in the index by the passed in vector.
89+
# You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.
90+
# @param embedding [Array<Float>] The vector to search for
91+
# @param k [Integer] The number of top results to return
92+
# @return [Array<Hash>] The results of the search
93+
# TODO - drop the named "embedding:" param so it is the same interface as #ask?
94+
def similarity_search_by_vector(embedding:, k: 4)
95+
model
96+
.nearest_neighbors(:embedding, embedding, distance: operator)
97+
.limit(k)
98+
end
99+
100+
# Ask a question and return the answer
101+
# @param question [String] The question to ask
102+
# @param k [Integer] The number of results to have in context
103+
# @yield [String] Stream responses back one String at a time
104+
# @return [String] The answer to the question
105+
def ask(question, k: 4, &block)
106+
# Noisy as the embedding column has a lot of data
107+
ActiveRecord::Base.logger.silence do
108+
search_results = similarity_search(query: question, k: k)
109+
110+
context = search_results.map do |result|
111+
result.as_vector
112+
end
113+
context = context.join("\n---\n")
114+
115+
prompt = generate_rag_prompt(question: question, context: context)
116+
117+
llm.chat(prompt: prompt, &block)
118+
end
119+
end
120+
end
121+
end
122+
123+
# Rails connection when configuring vectorsearch
124+
# Update READMEs
125+
# Rails migration to create a migration

lib/langchainrb_rails.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
require_relative "langchainrb_rails/version"
66
require "langchainrb_rails/railtie"
77
require "langchainrb_rails/config"
8+
require_relative "langchainrb_overrides/vectorsearch/pgvector"
89

910
module LangchainrbRails
1011
class Error < StandardError; end

lib/langchainrb_rails/active_record/hooks.rb

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,9 @@ def upsert_to_vectorsearch
6161
#
6262
# @return [String] the text representation of the model
6363
def as_vector
64-
to_json
64+
# Don't vectorize the embedding ... this would happen if it already exists
65+
# for a record and we update.
66+
to_json(except: :embedding)
6567
end
6668

6769
module ClassMethods
@@ -70,6 +72,21 @@ module ClassMethods
7072
# @param provider [Object] The `Langchain::Vectorsearch::*` instance
7173
def vectorsearch
7274
class_variable_set(:@@provider, LangchainrbRails.config.vectorsearch)
75+
76+
# Pgvector-specific configuration
77+
if LangchainrbRails.config.vectorsearch.is_a?(Langchain::Vectorsearch::Pgvector)
78+
has_neighbors(:embedding)
79+
end
80+
81+
LangchainrbRails.config.vectorsearch.model = self
82+
end
83+
84+
# Iterates over records and generate embeddings.
85+
# Will re-generate for ALL records (not just records with embeddings).
86+
def embed!
87+
find_each do |record|
88+
record.upsert_to_vectorsearch
89+
end
7390
end
7491

7592
# Search for similar texts
@@ -84,7 +101,7 @@ def similarity_search(query, k: 1)
84101
)
85102

86103
# We use "__id" when Weaviate is the provider
87-
ids = records.map { |record| record.dig("id") || record.dig("__id") }
104+
ids = records.map { |record| record.try("id") || record.dig("__id") }
88105
where(id: ids)
89106
end
90107

@@ -94,12 +111,12 @@ def similarity_search(query, k: 1)
94111
# @param k [Integer] The number of results to have in context
95112
# @yield [String] Stream responses back one String at a time
96113
# @return [String] The answer to the question
97-
def ask(question:, k: 4, &block)
114+
def ask(question, k: 4, &block)
98115
class_variable_get(:@@provider).ask(
99-
question: question,
116+
question,
100117
k: k,
101118
&block
102-
)
119+
).completion
103120
end
104121
end
105122
end

0 commit comments

Comments
 (0)