patterns-ai-core
diff --git a/‎README.md‎
Lines changed: 84 additions & 4 deletions b/‎README.md‎
Lines changed: 84 additions & 4 deletions
diff --git a/‎langchainrb_rails.gemspec‎
Lines changed: 2 additions & 1 deletion b/‎langchainrb_rails.gemspec‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎lib/langchainrb_overrides/vectorsearch/pgvector.rb‎
Lines changed: 125 additions & 0 deletions b/‎lib/langchainrb_overrides/vectorsearch/pgvector.rb‎
Lines changed: 125 additions & 0 deletions
diff --git a/‎lib/langchainrb_rails.rb‎
Lines changed: 1 addition & 0 deletions b/‎lib/langchainrb_rails.rb‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎lib/langchainrb_rails/active_record/hooks.rb‎
Lines changed: 22 additions & 5 deletions b/‎lib/langchainrb_rails/active_record/hooks.rb‎
Lines changed: 22 additions & 5 deletions
@@ -1,17 +1,17 @@
 💎🔗 Langchain.rb for Rails
 ---
-⚡ Building applications with LLMs through composability ⚡
-
-👨‍💻👩‍💻 CURRENTLY SEEKING PEOPLE TO FORM THE CORE GROUP OF MAINTAINERS WITH
+The fastest way to sprinkle AI ✨ on top of your Rails app. Add OpenAI-powered question-and-answering in minutes.
 
 ![Tests status](https://github.com/andreibondarev/langchainrb_rails/actions/workflows/ci.yml/badge.svg?branch=main)
 [![Gem Version](https://badge.fury.io/rb/langchainrb_rails.svg)](https://badge.fury.io/rb/langchainrb_rails)
 [![Docs](http://img.shields.io/badge/yard-docs-blue.svg)](http://rubydoc.info/gems/langchainrb_rails)
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/andreibondarev/langchainrb_rails/blob/main/LICENSE.txt)
 [![](https://dcbadge.vercel.app/api/server/WDARp7J2n8?compact=true&style=flat)](https://discord.gg/WDARp7J2n8)
 
+## Dependencies
 
-Langchain.rb is a library that's an abstraction layer on top many emergent AI, ML and other DS tools. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional software engineers.
+* Ruby 3.0+
+* Postgres 11+
 
 ## Table of Contents
 
@@ -28,8 +28,88 @@ If bundler is not being used to manage dependencies, install the gem by executin
 
     gem install langchainrb_rails
 
+## Configuration w/PgVector (requires Postgres 11+)
+
+1. Generate changes to support vectorsearch in your chosen ActiveRecord model
+
+```
+rails generate langchainrb_rails:pg_vector --model=Product --llm=openai
+```
+
+This adds required dependencies to your Gemfile, creates the initializer file `config/initializers/langchainrb_rails.rb`, database migrations to support vectorsearch, and adds the necessary code to the ActiveRecord model to enable vectorsearch.
+
+2. Bundle && Migrate
+
+```
+bundle && rails db:migrate
+```
+
+3. Set the env var `OPENAI_API_KEY` to your OpenAI API key.
+
+4. Generate embeddings for your model
+
+```
+[YOUR MODEL].embed!
+```
+
+This can take a while depending on the number of database records.
+
+## Usage
+
+### Question and Answering
+
+```
+Product.ask("list the brands of shoes that are in stock")
+```
+
+Returns a `String` with a natural language answer. The answer is assembled using the following steps:
+
+1. Turn the `question` into an embedding using the selected LLM.
+2. Find records that most closely match the question using Postgres vector similarity search (#similarity_search).
+3. Create a prompt using the question and insert the records (via `#as_vector`) into the prompt as context.
+4. Generate a completion using the prompt and the selected LLM.
+
+### Similarity Search
+
+```
+Product.similarity_search("t-shirt")
+```
+
+Returns ActiveRecord records that most closely match the `query` using Postgres vector similarity search.
+
+## Customization
+
+## Changing the vector representation of a record
+
+By default, embeddings are generated by calling the following within your model:
+
+```
+to_json(except: :embedding)
+```
+
+You can override this by defining an `#as_vector` method in your model:
+
+```
+def as_vector
+  res = to_json(except: :embedding, :owner_id, :user_id, :category_id)
+  res.merge({ "owner" => owner.name, "user" => user.name, "category" => category.name })
+end
+```
+
+Re-generate embeddings after modifying this method:
+
+```
+[YOUR MODEL].embed!
+```
+
 ## Rails Generators
 
+### PgVector Generator
+
+```
+rails generate langchainrb_rails:pg_vector --model=Product --llm=openai
+```
+
 ### Pinecone Generator - adds vectorsearch to your ActiveRecord model
 ```
 rails generate langchainrb_rails:pinecone --model=Product --llm=openai
 
@@ -30,7 +30,8 @@ Gem::Specification.new do |spec|
   spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
   spec.require_paths = ["lib"]
 
-  spec.add_dependency "langchainrb", "~> 0.7.0"
+  # TODO - add back ... loading locally
+  #spec.add_dependency "langchainrb", "~> 0.7.0"
 
   spec.add_development_dependency "pry-byebug", "~> 3.10.0"
   spec.add_development_dependency "yard", "~> 0.9.34"
 
@@ -0,0 +1,125 @@
+# frozen_string_literal: true
+
+module Langchain::Vectorsearch
+  class Pgvector < Base
+    #
+    # The PostgreSQL vector search adapter
+    #
+    # Gem requirements:
+    #     gem "pgvector", "~> 0.2"
+    #
+    # Usage:
+    #     pgvector = Langchain::Vectorsearch::Pgvector.new(llm:, model_name:)
+    #
+
+    # The operators supported by the PostgreSQL vector search adapter
+    OPERATORS = [
+      "cosine",
+      "euclidean",
+      "inner_product"
+    ]
+    DEFAULT_OPERATOR = "cosine"
+
+    attr_reader :db, :operator, :llm
+    attr_accessor :model
+
+    # @param url [String] The URL of the PostgreSQL database
+    # @param index_name [String] The name of the table to use for the index
+    # @param llm [Object] The LLM client to use
+    # @param namespace [String] The namespace to use for the index when inserting/querying
+    def initialize(llm:)
+      # If the line below is called, the generator fails as calls to
+      # LangchainrbRails.config.vectorsearch will generate an exception.
+      # These happen in the template files.
+      # depends_on "neighbor"
+
+      @operator = DEFAULT_OPERATOR
+
+      super(llm: llm)
+    end
+
+    # Add a list of texts to the index
+    # @param texts [Array<String>] The texts to add to the index
+    # @param ids [Array<String>] The ids to add to the index, in the same order as the texts
+    # @return [Array<Integer>] The the ids of the added texts.
+    def add_texts(texts:, ids:)
+      embeddings = texts.map do |text|
+        llm.embed(text: text).embedding
+      end
+
+      # I believe the records returned by #find must be in the
+      # same order as the embeddings. I _think_ this works for uuid ids but didn't test
+      # deeply.
+      # TODO - implement find_each so we don't load all records into memory
+      model.find(ids).each.with_index do |record, i|
+        record.update_column(:embedding, embeddings[i])
+      end
+    end
+
+    def update_texts(texts:, ids:)
+      add_texts(texts: texts, ids: ids)
+    end
+
+    # Invoke a rake task that will create an initializer (`config/initializers/langchain.rb`) file
+    # and db/migrations/* files
+    def create_default_schema
+      Rake::Task["pgvector"].invoke
+    end
+
+    # Destroy default schema
+    def destroy_default_schema
+      # Tell the user to rollback the migration
+    end
+
+    # Search for similar texts in the index
+    # @param query [String] The text to search for
+    # @param k [Integer] The number of top results to return
+    # @return [Array<Hash>] The results of the search
+    # TODO - drop the named "query:" param so it is the same interface as #ask?
+    def similarity_search(query:, k: 4)
+      embedding = llm.embed(text: query).embedding
+
+      similarity_search_by_vector(
+        embedding: embedding,
+        k: k
+      )
+    end
+
+    # Search for similar texts in the index by the passed in vector.
+    # You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.
+    # @param embedding [Array<Float>] The vector to search for
+    # @param k [Integer] The number of top results to return
+    # @return [Array<Hash>] The results of the search
+    # TODO - drop the named "embedding:" param so it is the same interface as #ask?
+    def similarity_search_by_vector(embedding:, k: 4)
+      model
+        .nearest_neighbors(:embedding, embedding, distance: operator)
+        .limit(k)
+    end
+
+    # Ask a question and return the answer
+    # @param question [String] The question to ask
+    # @param k [Integer] The number of results to have in context
+    # @yield [String] Stream responses back one String at a time
+    # @return [String] The answer to the question
+    def ask(question, k: 4, &block)
+      # Noisy as the embedding column has a lot of data
+      ActiveRecord::Base.logger.silence do
+        search_results = similarity_search(query: question, k: k)
+
+        context = search_results.map do |result|
+          result.as_vector
+        end
+        context = context.join("\n---\n")
+
+        prompt = generate_rag_prompt(question: question, context: context)
+
+        llm.chat(prompt: prompt, &block)
+      end
+    end
+  end
+end
+
+# Rails connection when configuring vectorsearch
+# Update READMEs
+# Rails migration to create a migration
@@ -5,6 +5,7 @@
 require_relative "langchainrb_rails/version"
 require "langchainrb_rails/railtie"
 require "langchainrb_rails/config"
+require_relative "langchainrb_overrides/vectorsearch/pgvector"
 
 module LangchainrbRails
   class Error < StandardError; end
 
@@ -61,7 +61,9 @@ def upsert_to_vectorsearch
       #
       # @return [String] the text representation of the model
       def as_vector
-        to_json
+        # Don't vectorize the embedding ... this would happen if it already exists
+        # for a record and we update.
+        to_json(except: :embedding)
       end
 
       module ClassMethods
@@ -70,6 +72,21 @@ module ClassMethods
         # @param provider [Object] The `Langchain::Vectorsearch::*` instance
         def vectorsearch
           class_variable_set(:@@provider, LangchainrbRails.config.vectorsearch)
+
+          # Pgvector-specific configuration
+          if LangchainrbRails.config.vectorsearch.is_a?(Langchain::Vectorsearch::Pgvector)
+            has_neighbors(:embedding)
+          end
+
+          LangchainrbRails.config.vectorsearch.model = self
+        end
+
+        # Iterates over records and generate embeddings.
+        # Will re-generate for ALL records (not just records with embeddings).
+        def embed!
+          find_each do |record|
+            record.upsert_to_vectorsearch
+          end
         end
 
         # Search for similar texts
@@ -84,7 +101,7 @@ def similarity_search(query, k: 1)
           )
 
           # We use "__id" when Weaviate is the provider
-          ids = records.map { |record| record.dig("id") || record.dig("__id") }
+          ids = records.map { |record| record.try("id") || record.dig("__id") }
           where(id: ids)
         end
 
@@ -94,12 +111,12 @@ def similarity_search(query, k: 1)
         # @param k [Integer] The number of results to have in context
         # @yield [String] Stream responses back one String at a time
         # @return [String] The answer to the question
-        def ask(question:, k: 4, &block)
+        def ask(question, k: 4, &block)
           class_variable_get(:@@provider).ask(
-            question: question,
+            question,
             k: k,
             &block
-          )
+          ).completion
         end
       end
     end