patterns-ai-core
diff --git a/‎README.md‎
Lines changed: 86 additions & 6 deletions b/‎README.md‎
Lines changed: 86 additions & 6 deletions
diff --git a/‎lib/langchainrb_overrides/vectorsearch/pgvector.rb‎
Lines changed: 124 additions & 0 deletions b/‎lib/langchainrb_overrides/vectorsearch/pgvector.rb‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎lib/langchainrb_rails.rb‎
Lines changed: 2 additions & 0 deletions b/‎lib/langchainrb_rails.rb‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎lib/langchainrb_rails/active_record/hooks.rb‎
Lines changed: 22 additions & 5 deletions b/‎lib/langchainrb_rails/active_record/hooks.rb‎
Lines changed: 22 additions & 5 deletions
diff --git a/‎lib/langchainrb_rails/generators/langchainrb_rails/pgvector_generator.rb‎
Lines changed: 75 additions & 0 deletions b/‎lib/langchainrb_rails/generators/langchainrb_rails/pgvector_generator.rb‎
Lines changed: 75 additions & 0 deletions
@@ -1,17 +1,17 @@
 💎🔗 Langchain.rb for Rails
 ---
-⚡ Building applications with LLMs through composability ⚡
-
-👨‍💻👩‍💻 CURRENTLY SEEKING PEOPLE TO FORM THE CORE GROUP OF MAINTAINERS WITH
+The fastest way to sprinkle AI ✨ on top of your Rails app. Add OpenAI-powered question-and-answering in minutes.
 
 ![Tests status](https://github.com/andreibondarev/langchainrb_rails/actions/workflows/ci.yml/badge.svg?branch=main)
 [![Gem Version](https://badge.fury.io/rb/langchainrb_rails.svg)](https://badge.fury.io/rb/langchainrb_rails)
 [![Docs](http://img.shields.io/badge/yard-docs-blue.svg)](http://rubydoc.info/gems/langchainrb_rails)
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/andreibondarev/langchainrb_rails/blob/main/LICENSE.txt)
 [![](https://dcbadge.vercel.app/api/server/WDARp7J2n8?compact=true&style=flat)](https://discord.gg/WDARp7J2n8)
 
+## Dependencies
 
-Langchain.rb is a library that's an abstraction layer on top many emergent AI, ML and other DS tools. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional software engineers.
+* Ruby 3.0+
+* Postgres 11+
 
 ## Table of Contents
 
@@ -28,10 +28,90 @@ If bundler is not being used to manage dependencies, install the gem by executin
 
     gem install langchainrb_rails
 
+## Configuration w/PgVector (requires Postgres 11+)
+
+1. Generate changes to support vectorsearch in your chosen ActiveRecord model
+
+```bash
+rails generate langchainrb_rails:pgvector --model=Product --llm=openai
+```
+
+This adds required dependencies to your Gemfile, creates the initializer file `config/initializers/langchainrb_rails.rb`, database migrations to support vectorsearch, and adds the necessary code to the ActiveRecord model to enable vectorsearch.
+
+2. Bundle && Migrate
+
+```bash
+bundle install && rails db:migrate
+```
+
+3. Set the env var `OPENAI_API_KEY` to your OpenAI API key.
+
+4. Generate embeddings for your model
+
+```ruby
+[YOUR MODEL].embed!
+```
+
+This can take a while depending on the number of database records.
+
+## Usage
+
+### Question and Answering
+
+```ruby
+Product.ask("list the brands of shoes that are in stock")
+```
+
+Returns a `String` with a natural language answer. The answer is assembled using the following steps:
+
+1. Turn the `question` into an embedding using the selected LLM.
+2. Find records that most closely match the question using Postgres vector similarity search (#similarity_search).
+3. Create a prompt using the question and insert the records (via `#as_vector`) into the prompt as context.
+4. Generate a completion using the prompt and the selected LLM.
+
+### Similarity Search
+
+```ruby
+Product.similarity_search("t-shirt")
+```
+
+Returns ActiveRecord records that most closely match the `query` using Postgres vector similarity search.
+
+## Customization
+
+## Changing the vector representation of a record
+
+By default, embeddings are generated by calling the following within your model:
+
+```ruby
+to_json(except: :embedding)
+```
+
+You can override this by defining an `#as_vector` method in your model:
+
+```ruby
+def as_vector
+  res = to_json(except: :embedding, :owner_id, :user_id, :category_id)
+  res.merge({ "owner" => owner.name, "user" => user.name, "category" => category.name })
+end
+```
+
+Re-generate embeddings after modifying this method:
+
+```ruby
+[YOUR MODEL].embed!
+```
+
 ## Rails Generators
 
-### Pinecone Generator - adds vectorsearch to your ActiveRecord model
+### PgVector Generator
+
+```bash
+rails generate langchainrb_rails:pgvector --model=Product --llm=openai
 ```
+
+### Pinecone Generator - adds vectorsearch to your ActiveRecord model
+```bash
 rails generate langchainrb_rails:pinecone --model=Product --llm=openai
 ```
 
@@ -45,6 +125,6 @@ Pinecone Generator does the following:
 3. Adds `pinecone` gem to the Gemfile
 
 ### Chroma Generator - adds vectorsearch to your ActiveRecord model
-```
+```bash
 rails generate langchainrb_rails:chroma --model=Product --llm=openai
 ```
@@ -0,0 +1,124 @@
+# frozen_string_literal: true
+
+# Overriding Langchain.rb's Pgvector implementation to use ActiveRecord.
+# Original implementation: https://github.com/andreibondarev/langchainrb/blob/main/lib/langchain/vectorsearch/pgvector.rb
+
+module Langchain::Vectorsearch
+  class Pgvector < Base
+    #
+    # The PostgreSQL vector search adapter
+    #
+    # Gem requirements:
+    #     gem "pgvector", "~> 0.2"
+    #
+    # Usage:
+    #     pgvector = Langchain::Vectorsearch::Pgvector.new(llm:)
+    #
+
+    # The operators supported by the PostgreSQL vector search adapter
+    OPERATORS = [
+      "cosine",
+      "euclidean",
+      "inner_product"
+    ]
+    DEFAULT_OPERATOR = "cosine"
+
+    attr_reader :operator, :llm
+    attr_accessor :model
+
+    # @param url [String] The URL of the PostgreSQL database
+    # @param index_name [String] The name of the table to use for the index
+    # @param llm [Object] The LLM client to use
+    # @param namespace [String] The namespace to use for the index when inserting/querying
+    def initialize(llm:)
+      # If the line below is called, the generator fails as calls to
+      # LangchainrbRails.config.vectorsearch will generate an exception.
+      # These happen in the template files.
+      # depends_on "neighbor"
+
+      @operator = DEFAULT_OPERATOR
+
+      super(llm: llm)
+    end
+
+    # Add a list of texts to the index
+    # @param texts [Array<String>] The texts to add to the index
+    # @param ids [Array<String>] The ids to add to the index, in the same order as the texts
+    # @return [Array<Integer>] The the ids of the added texts.
+    def add_texts(texts:, ids:)
+      embeddings = texts.map do |text|
+        llm.embed(text: text).embedding
+      end
+
+      # I believe the records returned by #find must be in the
+      # same order as the embeddings. I _think_ this works for uuid ids but didn't test
+      # deeply.
+      # TODO - implement find_each so we don't load all records into memory
+      model.find(ids).each.with_index do |record, i|
+        record.update_column(:embedding, embeddings[i])
+      end
+    end
+
+    def update_texts(texts:, ids:)
+      add_texts(texts: texts, ids: ids)
+    end
+
+    # Invoke a rake task that will create an initializer (`config/initializers/langchain.rb`) file
+    # and db/migrations/* files
+    def create_default_schema
+      Rake::Task["pgvector"].invoke
+    end
+
+    # Destroy default schema
+    def destroy_default_schema
+      # Tell the user to rollback the migration
+    end
+
+    # Search for similar texts in the index
+    # @param query [String] The text to search for
+    # @param k [Integer] The number of top results to return
+    # @return [Array<Hash>] The results of the search
+    # TODO - drop the named "query:" param so it is the same interface as #ask?
+    def similarity_search(query:, k: 4)
+      embedding = llm.embed(text: query).embedding
+
+      similarity_search_by_vector(
+        embedding: embedding,
+        k: k
+      )
+    end
+
+    # Search for similar texts in the index by the passed in vector.
+    # You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.
+    # @param embedding [Array<Float>] The vector to search for
+    # @param k [Integer] The number of top results to return
+    # @return [Array<Hash>] The results of the search
+    # TODO - drop the named "embedding:" param so it is the same interface as #ask?
+    def similarity_search_by_vector(embedding:, k: 4)
+      model
+        .nearest_neighbors(:embedding, embedding, distance: operator)
+        .limit(k)
+    end
+
+    # Ask a question and return the answer
+    # @param question [String] The question to ask
+    # @param k [Integer] The number of results to have in context
+    # @yield [String] Stream responses back one String at a time
+    # @return [String] The answer to the question
+    def ask(question, k: 4, &block)
+      # Noisy as the embedding column has a lot of data
+      ActiveRecord::Base.logger.silence do
+        search_results = similarity_search(query: question, k: k)
+
+        context = search_results.map do |result|
+          result.as_vector
+        end
+        context = context.join("\n---\n")
+
+        prompt = generate_rag_prompt(question: question, context: context)
+
+        llm.chat(prompt: prompt, &block)
+      end
+    end
+  end
+end
@@ -1,10 +1,12 @@
 # frozen_string_literal: true
 
+require "forwardable"
 require "langchain"
 require "rails"
 require_relative "langchainrb_rails/version"
 require "langchainrb_rails/railtie"
 require "langchainrb_rails/config"
+require_relative "langchainrb_overrides/vectorsearch/pgvector"
 
 module LangchainrbRails
   class Error < StandardError; end
 
@@ -61,7 +61,9 @@ def upsert_to_vectorsearch
       #
       # @return [String] the text representation of the model
       def as_vector
-        to_json
+        # Don't vectorize the embedding ... this would happen if it already exists
+        # for a record and we update.
+        to_json(except: :embedding)
       end
 
       module ClassMethods
@@ -70,6 +72,21 @@ module ClassMethods
         # @param provider [Object] The `Langchain::Vectorsearch::*` instance
         def vectorsearch
           class_variable_set(:@@provider, LangchainrbRails.config.vectorsearch)
+
+          # Pgvector-specific configuration
+          if LangchainrbRails.config.vectorsearch.is_a?(Langchain::Vectorsearch::Pgvector)
+            has_neighbors(:embedding)
+          end
+
+          LangchainrbRails.config.vectorsearch.model = self
+        end
+
+        # Iterates over records and generate embeddings.
+        # Will re-generate for ALL records (not just records with embeddings).
+        def embed!
+          find_each do |record|
+            record.upsert_to_vectorsearch
+          end
         end
 
         # Search for similar texts
@@ -84,7 +101,7 @@ def similarity_search(query, k: 1)
           )
 
           # We use "__id" when Weaviate is the provider
-          ids = records.map { |record| record.dig("id") || record.dig("__id") }
+          ids = records.map { |record| record.try("id") || record.dig("__id") }
           where(id: ids)
         end
 
@@ -94,12 +111,12 @@ def similarity_search(query, k: 1)
         # @param k [Integer] The number of results to have in context
         # @yield [String] Stream responses back one String at a time
         # @return [String] The answer to the question
-        def ask(question:, k: 4, &block)
+        def ask(question, k: 4, &block)
           class_variable_get(:@@provider).ask(
-            question: question,
+            question,
             k: k,
             &block
-          )
+          ).completion
         end
       end
     end
 
@@ -0,0 +1,75 @@
+# frozen_string_literal: true
+
+module LangchainrbRails
+  module Generators
+    #
+    # Usage:
+    #     rails g langchain:pgvector -model=Product -llm=openai
+    #
+    class PgvectorGenerator < LangchainrbRails::Generators::BaseGenerator
+      desc "This generator adds Pgvector vectorsearch integration to your ActiveRecord model"
+      source_root File.join(__dir__, "templates")
+
+      def copy_migration
+        migration_template "enable_vector_extension_template.rb", "db/migrate/enable_vector_extension.rb", migration_version: migration_version
+        migration_template "add_vector_column_template.rb", "db/migrate/add_vector_column_to_#{table_name}.rb", migration_version: migration_version
+      end
+
+      def create_initializer_file
+        template "pgvector_initializer.rb", "config/initializers/langchainrb_rails.rb"
+      end
+
+      def migration_version
+        "[#{::ActiveRecord::VERSION::MAJOR}.#{::ActiveRecord::VERSION::MINOR}]"
+      end
+
+      def add_to_model
+        inject_into_class "app/models/#{model_name.downcase}.rb", model_name do
+          "  vectorsearch\n\n  after_save :upsert_to_vectorsearch\n\n"
+        end
+      end
+
+      def add_to_gemfile
+        # Dependency for Langchain PgVector
+        gem "neighbor"
+        gem "ruby-openai"
+      end
+
+      def post_install_message
+        say "Please do the following to start Q&A with your #{model_name} records:", :green
+        say "1. Run `bundle install` to install the new gems."
+        say "2. Set `OPENAI_API_KEY` environment variable to your OpenAI API key."
+        say "3. Run `rails db:migrate` to apply the database migrations to enable pgvector and add the embedding column."
+        say "4. In Rails console, run `#{model_name}.embed!` to set the embeddings for all records."
+        say "5. Ask a question in the Rails console, ie: `#{model_name}.ask('[YOUR QUESTION]')`"
+      end
+
+      private
+
+      # @return [String] Name of the model
+      def model_name
+        options["model"]
+      end
+
+      # @return [String] Table name of the model
+      def table_name
+        model_name.downcase.pluralize
+      end
+
+      # @return [String] LLM provider to use
+      def llm
+        options["llm"]
+      end
+
+      # @return [Langchain::LLM::*] LLM class
+      def llm_class
+        Langchain::LLM.const_get(LLMS[llm])
+      end
+
+      # @return [Integer] Dimension of the vector to be used
+      def vector_dimension
+        llm_class.default_dimension
+      end
+    end
+  end
+end