Skip to content

Commit dad27ef

Browse files
Merge pull request #10 from andreibondarev/pgvector
Pgvector
2 parents 9f605d3 + c35399c commit dad27ef

File tree

11 files changed

+361
-16
lines changed

11 files changed

+361
-16
lines changed

README.md

Lines changed: 86 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
💎🔗 Langchain.rb for Rails
22
---
3-
⚡ Building applications with LLMs through composability ⚡
4-
5-
👨‍💻👩‍💻 CURRENTLY SEEKING PEOPLE TO FORM THE CORE GROUP OF MAINTAINERS WITH
3+
The fastest way to sprinkle AI ✨ on top of your Rails app. Add OpenAI-powered question-and-answering in minutes.
64

75
![Tests status](https://github.com/andreibondarev/langchainrb_rails/actions/workflows/ci.yml/badge.svg?branch=main)
86
[![Gem Version](https://badge.fury.io/rb/langchainrb_rails.svg)](https://badge.fury.io/rb/langchainrb_rails)
97
[![Docs](http://img.shields.io/badge/yard-docs-blue.svg)](http://rubydoc.info/gems/langchainrb_rails)
108
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/andreibondarev/langchainrb_rails/blob/main/LICENSE.txt)
119
[![](https://dcbadge.vercel.app/api/server/WDARp7J2n8?compact=true&style=flat)](https://discord.gg/WDARp7J2n8)
1210

11+
## Dependencies
1312

14-
Langchain.rb is a library that's an abstraction layer on top many emergent AI, ML and other DS tools. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional software engineers.
13+
* Ruby 3.0+
14+
* Postgres 11+
1515

1616
## Table of Contents
1717

@@ -28,10 +28,90 @@ If bundler is not being used to manage dependencies, install the gem by executin
2828

2929
gem install langchainrb_rails
3030

31+
## Configuration w/PgVector (requires Postgres 11+)
32+
33+
1. Generate changes to support vectorsearch in your chosen ActiveRecord model
34+
35+
```bash
36+
rails generate langchainrb_rails:pgvector --model=Product --llm=openai
37+
```
38+
39+
This adds required dependencies to your Gemfile, creates the initializer file `config/initializers/langchainrb_rails.rb`, database migrations to support vectorsearch, and adds the necessary code to the ActiveRecord model to enable vectorsearch.
40+
41+
2. Bundle && Migrate
42+
43+
```bash
44+
bundle install && rails db:migrate
45+
```
46+
47+
3. Set the env var `OPENAI_API_KEY` to your OpenAI API key.
48+
49+
4. Generate embeddings for your model
50+
51+
```ruby
52+
[YOUR MODEL].embed!
53+
```
54+
55+
This can take a while depending on the number of database records.
56+
57+
## Usage
58+
59+
### Question and Answering
60+
61+
```ruby
62+
Product.ask("list the brands of shoes that are in stock")
63+
```
64+
65+
Returns a `String` with a natural language answer. The answer is assembled using the following steps:
66+
67+
1. Turn the `question` into an embedding using the selected LLM.
68+
2. Find records that most closely match the question using Postgres vector similarity search (#similarity_search).
69+
3. Create a prompt using the question and insert the records (via `#as_vector`) into the prompt as context.
70+
4. Generate a completion using the prompt and the selected LLM.
71+
72+
### Similarity Search
73+
74+
```ruby
75+
Product.similarity_search("t-shirt")
76+
```
77+
78+
Returns ActiveRecord records that most closely match the `query` using Postgres vector similarity search.
79+
80+
## Customization
81+
82+
## Changing the vector representation of a record
83+
84+
By default, embeddings are generated by calling the following within your model:
85+
86+
```ruby
87+
to_json(except: :embedding)
88+
```
89+
90+
You can override this by defining an `#as_vector` method in your model:
91+
92+
```ruby
93+
def as_vector
94+
res = to_json(except: :embedding, :owner_id, :user_id, :category_id)
95+
res.merge({ "owner" => owner.name, "user" => user.name, "category" => category.name })
96+
end
97+
```
98+
99+
Re-generate embeddings after modifying this method:
100+
101+
```ruby
102+
[YOUR MODEL].embed!
103+
```
104+
31105
## Rails Generators
32106

33-
### Pinecone Generator - adds vectorsearch to your ActiveRecord model
107+
### PgVector Generator
108+
109+
```bash
110+
rails generate langchainrb_rails:pgvector --model=Product --llm=openai
34111
```
112+
113+
### Pinecone Generator - adds vectorsearch to your ActiveRecord model
114+
```bash
35115
rails generate langchainrb_rails:pinecone --model=Product --llm=openai
36116
```
37117

@@ -45,6 +125,6 @@ Pinecone Generator does the following:
45125
3. Adds `pinecone` gem to the Gemfile
46126

47127
### Chroma Generator - adds vectorsearch to your ActiveRecord model
48-
```
128+
```bash
49129
rails generate langchainrb_rails:chroma --model=Product --llm=openai
50130
```
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# frozen_string_literal: true
2+
3+
# Overriding Langchain.rb's Pgvector implementation to use ActiveRecord.
4+
# Original implementation: https://github.com/andreibondarev/langchainrb/blob/main/lib/langchain/vectorsearch/pgvector.rb
5+
6+
module Langchain::Vectorsearch
7+
class Pgvector < Base
8+
#
9+
# The PostgreSQL vector search adapter
10+
#
11+
# Gem requirements:
12+
# gem "pgvector", "~> 0.2"
13+
#
14+
# Usage:
15+
# pgvector = Langchain::Vectorsearch::Pgvector.new(llm:)
16+
#
17+
18+
# The operators supported by the PostgreSQL vector search adapter
19+
OPERATORS = [
20+
"cosine",
21+
"euclidean",
22+
"inner_product"
23+
]
24+
DEFAULT_OPERATOR = "cosine"
25+
26+
attr_reader :operator, :llm
27+
attr_accessor :model
28+
29+
# @param url [String] The URL of the PostgreSQL database
30+
# @param index_name [String] The name of the table to use for the index
31+
# @param llm [Object] The LLM client to use
32+
# @param namespace [String] The namespace to use for the index when inserting/querying
33+
def initialize(llm:)
34+
# If the line below is called, the generator fails as calls to
35+
# LangchainrbRails.config.vectorsearch will generate an exception.
36+
# These happen in the template files.
37+
# depends_on "neighbor"
38+
39+
@operator = DEFAULT_OPERATOR
40+
41+
super(llm: llm)
42+
end
43+
44+
# Add a list of texts to the index
45+
# @param texts [Array<String>] The texts to add to the index
46+
# @param ids [Array<String>] The ids to add to the index, in the same order as the texts
47+
# @return [Array<Integer>] The the ids of the added texts.
48+
def add_texts(texts:, ids:)
49+
embeddings = texts.map do |text|
50+
llm.embed(text: text).embedding
51+
end
52+
53+
# I believe the records returned by #find must be in the
54+
# same order as the embeddings. I _think_ this works for uuid ids but didn't test
55+
# deeply.
56+
# TODO - implement find_each so we don't load all records into memory
57+
model.find(ids).each.with_index do |record, i|
58+
record.update_column(:embedding, embeddings[i])
59+
end
60+
end
61+
62+
def update_texts(texts:, ids:)
63+
add_texts(texts: texts, ids: ids)
64+
end
65+
66+
# Invoke a rake task that will create an initializer (`config/initializers/langchain.rb`) file
67+
# and db/migrations/* files
68+
def create_default_schema
69+
Rake::Task["pgvector"].invoke
70+
end
71+
72+
# Destroy default schema
73+
def destroy_default_schema
74+
# Tell the user to rollback the migration
75+
end
76+
77+
# Search for similar texts in the index
78+
# @param query [String] The text to search for
79+
# @param k [Integer] The number of top results to return
80+
# @return [Array<Hash>] The results of the search
81+
# TODO - drop the named "query:" param so it is the same interface as #ask?
82+
def similarity_search(query:, k: 4)
83+
embedding = llm.embed(text: query).embedding
84+
85+
similarity_search_by_vector(
86+
embedding: embedding,
87+
k: k
88+
)
89+
end
90+
91+
# Search for similar texts in the index by the passed in vector.
92+
# You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.
93+
# @param embedding [Array<Float>] The vector to search for
94+
# @param k [Integer] The number of top results to return
95+
# @return [Array<Hash>] The results of the search
96+
# TODO - drop the named "embedding:" param so it is the same interface as #ask?
97+
def similarity_search_by_vector(embedding:, k: 4)
98+
model
99+
.nearest_neighbors(:embedding, embedding, distance: operator)
100+
.limit(k)
101+
end
102+
103+
# Ask a question and return the answer
104+
# @param question [String] The question to ask
105+
# @param k [Integer] The number of results to have in context
106+
# @yield [String] Stream responses back one String at a time
107+
# @return [String] The answer to the question
108+
def ask(question, k: 4, &block)
109+
# Noisy as the embedding column has a lot of data
110+
ActiveRecord::Base.logger.silence do
111+
search_results = similarity_search(query: question, k: k)
112+
113+
context = search_results.map do |result|
114+
result.as_vector
115+
end
116+
context = context.join("\n---\n")
117+
118+
prompt = generate_rag_prompt(question: question, context: context)
119+
120+
llm.chat(prompt: prompt, &block)
121+
end
122+
end
123+
end
124+
end

lib/langchainrb_rails.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
# frozen_string_literal: true
22

3+
require "forwardable"
34
require "langchain"
45
require "rails"
56
require_relative "langchainrb_rails/version"
67
require "langchainrb_rails/railtie"
78
require "langchainrb_rails/config"
9+
require_relative "langchainrb_overrides/vectorsearch/pgvector"
810

911
module LangchainrbRails
1012
class Error < StandardError; end

lib/langchainrb_rails/active_record/hooks.rb

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,9 @@ def upsert_to_vectorsearch
6161
#
6262
# @return [String] the text representation of the model
6363
def as_vector
64-
to_json
64+
# Don't vectorize the embedding ... this would happen if it already exists
65+
# for a record and we update.
66+
to_json(except: :embedding)
6567
end
6668

6769
module ClassMethods
@@ -70,6 +72,21 @@ module ClassMethods
7072
# @param provider [Object] The `Langchain::Vectorsearch::*` instance
7173
def vectorsearch
7274
class_variable_set(:@@provider, LangchainrbRails.config.vectorsearch)
75+
76+
# Pgvector-specific configuration
77+
if LangchainrbRails.config.vectorsearch.is_a?(Langchain::Vectorsearch::Pgvector)
78+
has_neighbors(:embedding)
79+
end
80+
81+
LangchainrbRails.config.vectorsearch.model = self
82+
end
83+
84+
# Iterates over records and generate embeddings.
85+
# Will re-generate for ALL records (not just records with embeddings).
86+
def embed!
87+
find_each do |record|
88+
record.upsert_to_vectorsearch
89+
end
7390
end
7491

7592
# Search for similar texts
@@ -84,7 +101,7 @@ def similarity_search(query, k: 1)
84101
)
85102

86103
# We use "__id" when Weaviate is the provider
87-
ids = records.map { |record| record.dig("id") || record.dig("__id") }
104+
ids = records.map { |record| record.try("id") || record.dig("__id") }
88105
where(id: ids)
89106
end
90107

@@ -94,12 +111,12 @@ def similarity_search(query, k: 1)
94111
# @param k [Integer] The number of results to have in context
95112
# @yield [String] Stream responses back one String at a time
96113
# @return [String] The answer to the question
97-
def ask(question:, k: 4, &block)
114+
def ask(question, k: 4, &block)
98115
class_variable_get(:@@provider).ask(
99-
question: question,
116+
question,
100117
k: k,
101118
&block
102-
)
119+
).completion
103120
end
104121
end
105122
end
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# frozen_string_literal: true
2+
3+
module LangchainrbRails
4+
module Generators
5+
#
6+
# Usage:
7+
# rails g langchain:pgvector -model=Product -llm=openai
8+
#
9+
class PgvectorGenerator < LangchainrbRails::Generators::BaseGenerator
10+
desc "This generator adds Pgvector vectorsearch integration to your ActiveRecord model"
11+
source_root File.join(__dir__, "templates")
12+
13+
def copy_migration
14+
migration_template "enable_vector_extension_template.rb", "db/migrate/enable_vector_extension.rb", migration_version: migration_version
15+
migration_template "add_vector_column_template.rb", "db/migrate/add_vector_column_to_#{table_name}.rb", migration_version: migration_version
16+
end
17+
18+
def create_initializer_file
19+
template "pgvector_initializer.rb", "config/initializers/langchainrb_rails.rb"
20+
end
21+
22+
def migration_version
23+
"[#{::ActiveRecord::VERSION::MAJOR}.#{::ActiveRecord::VERSION::MINOR}]"
24+
end
25+
26+
def add_to_model
27+
inject_into_class "app/models/#{model_name.downcase}.rb", model_name do
28+
" vectorsearch\n\n after_save :upsert_to_vectorsearch\n\n"
29+
end
30+
end
31+
32+
def add_to_gemfile
33+
# Dependency for Langchain PgVector
34+
gem "neighbor"
35+
gem "ruby-openai"
36+
end
37+
38+
def post_install_message
39+
say "Please do the following to start Q&A with your #{model_name} records:", :green
40+
say "1. Run `bundle install` to install the new gems."
41+
say "2. Set `OPENAI_API_KEY` environment variable to your OpenAI API key."
42+
say "3. Run `rails db:migrate` to apply the database migrations to enable pgvector and add the embedding column."
43+
say "4. In Rails console, run `#{model_name}.embed!` to set the embeddings for all records."
44+
say "5. Ask a question in the Rails console, ie: `#{model_name}.ask('[YOUR QUESTION]')`"
45+
end
46+
47+
private
48+
49+
# @return [String] Name of the model
50+
def model_name
51+
options["model"]
52+
end
53+
54+
# @return [String] Table name of the model
55+
def table_name
56+
model_name.downcase.pluralize
57+
end
58+
59+
# @return [String] LLM provider to use
60+
def llm
61+
options["llm"]
62+
end
63+
64+
# @return [Langchain::LLM::*] LLM class
65+
def llm_class
66+
Langchain::LLM.const_get(LLMS[llm])
67+
end
68+
69+
# @return [Integer] Dimension of the vector to be used
70+
def vector_dimension
71+
llm_class.default_dimension
72+
end
73+
end
74+
end
75+
end

0 commit comments

Comments
 (0)