Skip to content

Conversation

@drborges
Copy link
Contributor

@drborges drborges commented Nov 28, 2025

Summary

Add support to Stale-While-Revalidate (SWR) prompt cache strategy based off the Design Document: docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md

The GIF below shows the new feature in action against a self-hosted production instance of Langfuse using the following config:

Langfuse.configure do |config|
  config.public_key = ENV["LANGFUSE_PUB_KEY"]
  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
  config.base_url = ENV["LANGFUSE_BASE_URL"]

  config.cache_backend = :rails
  config.cache_ttl = 2 # cache entries are fresh for 2 seconds, after that, they become stale

  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 2 # cache stale entries are served for 2 seconds until the expire completely (grace period).
  config.cache_refresh_threads = 5
end

2025-11-28 14 23 40

📊 Impact

  • Performance: Enables instant responses for 99% of requests by serving stale data while refreshing in background
  • Backward Compatibility: Fully backward compatible - SWR is opt-in via configuration (requires :rails backend)
  • Dependencies: Requires concurrent-ruby gem for thread pool management

Usage

Configure the client:

Langfuse.configure do |config|
  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
  
  config.cache_backend = :rails
  config.cache_ttl = 300 # Fresh for 5 minutes
  
  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 300 # Grace period: 5 more minutes
  config.cache_refresh_threads = 10 # Background thread pool size
end

Once configured, SWR works transparently:

client = Langfuse.client

# First request - populates cache
prompt = client.get_prompt("greeting") # ~100ms (API call)

# Subsequent requests while fresh
prompt = client.get_prompt("greeting") # ~1ms (cache hit)

# After cache_ttl expires but within grace period
prompt = client.get_prompt("greeting") # ~1ms (stale data + background refresh)

# Background refresh completes, next request gets fresh data
prompt = client.get_prompt("greeting") # ~1ms (fresh cache)

More details on the proposed documentation.

@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 979705c to ae8746b Compare November 28, 2025 14:55
@drborges drborges marked this pull request as draft November 28, 2025 17:13
@drborges
Copy link
Contributor Author

@NoahFisher I still have to deal with some rubocop offenses and so one final review of the test coverage but I could use some feedback on this draft, I also have one open question to double check in the PR description.

@drborges drborges force-pushed the feature/stale-while-revalidate branch from da9e215 to d798182 Compare November 28, 2025 18:00
@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 43fcf9b to 7ff496b Compare December 3, 2025 20:03
Add three new configuration options to support SWR caching:
- cache_stale_while_revalidate: Enable/disable SWR (default: false)
- cache_stale_ttl: Grace period for serving stale data (default: 300s)
- cache_refresh_threads: Background thread pool size (default: 5)

SWR caching requires Rails cache backend and will be validated
during configuration. This enables serving stale data instantly
while refreshing in the background for better performance.
Restructure CacheEntry to support stale-while-revalidate pattern:
- Replace single expires_at with fresh_until and stale_until
- Add fresh? method: entry is fresh and can be served immediately
- Add stale? method: entry is stale but usable (revalidate in background)
- Update expired? method: entry must be revalidated synchronously

This three-state model enables SWR caching where stale entries
can be served instantly while a background refresh occurs.
Add full SWR caching support with concurrent background refresh:
- fetch_with_stale_while_revalidate: Main SWR fetch method
- Three-state handling: FRESH (instant), STALE (instant + refresh), MISS (sync fetch)
- Concurrent thread pool for background refresh operations
- Stampede protection during background refresh
- Logger integration for debugging cache states
- Graceful shutdown of thread pool

The adapter now accepts stale_ttl and refresh_threads parameters.
When stale_ttl is set, SWR is enabled. When nil, falls back to
standard fetch_with_lock behavior.

This provides instant responses for 99% of requests by serving
stale data while refreshing in the background.
Update Client to pass SWR configuration options to RailsCacheAdapter:
- Pass stale_ttl (only when cache_stale_while_revalidate is enabled)
- Pass refresh_threads for background thread pool size
- Pass logger for debugging and error reporting

Extract create_rails_cache_adapter method for clarity.
When SWR is disabled, stale_ttl is nil, keeping standard behavior.

Tests verify correct configuration propagation and thread pool
creation based on SWR settings.
Refactor get_prompt to choose optimal caching strategy based on
available cache capabilities:
1. SWR cache (fetch_with_stale_while_revalidate) - best performance
2. Distributed cache (fetch_with_lock) - stampede protection
3. Simple cache (get/set) - basic in-memory caching
4. No cache - direct API fetch

Extract separate methods for each strategy for better maintainability
and testability. The strategy is selected at runtime based on which
methods the cache adapter responds to.

This enables automatic use of SWR when available without breaking
existing cache implementations.
This was the pattern already in place. This commit simply expands the scope to some of the new spec examples added, however, it would be interesting to discuss whether we need to allow for a larger number of memoized helpers or if some refactoring is needed for these specs.
@drborges drborges force-pushed the feature/stale-while-revalidate branch from 7ff496b to ab7628d Compare December 4, 2025 14:02
Add spec examples to verify that SWR (Stale-While-Revalidate) is correctly
enabled only when stale_ttl is positive, and disabled for zero or negative
values. Adds thread pool initialization coverage for both cache adapters.

All 785 tests pass with 97.36% coverage maintained.
@drborges drborges marked this pull request as ready for review December 4, 2025 18:08
@drborges
Copy link
Contributor Author

drborges commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

@drborges
Copy link
Contributor Author

drborges commented Dec 4, 2025

My last commit is disabling Rubocop's Metrics/ClassLength for the ApiClient class but it may be interesting to perhaps move parts of the class into mixin modules to keep the file under the desired length. Thoughts @NoahFisher?

@kxzk
Copy link
Collaborator

kxzk commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

Will try and review today.

Copy link
Collaborator

@kxzk kxzk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but overall looks solid

@drborges
Copy link
Contributor Author

drborges commented Dec 9, 2025

@kxzk thanks for all the feedback. I'll address them next.

Consolidates all cache locking mechanisms to use a single, simplified API:
- Unified lock keys: Both fetch and refresh operations use the same lock
- Single timeout: All locks use lock_timeout (10s) instead of separate values
- Simplified method names: build_lock_key, acquire_lock, release_lock

Changes:
- StaleWhileRevalidate module now defines single set of abstract lock methods
- RailsCacheAdapter implements distributed locking with Rails.cache
- PromptCache implements in-memory thread-safe locking
- All tests updated to reflect unified lock key format (:lock suffix)

Benefits:
- Simpler, more maintainable code
- Single lock prevents duplicate work for both fetch and refresh scenarios
- Clearer intent and easier to reason about
- Reduced code duplication

All tests passing (785 examples, 97.58% coverage)
Resolves TODO by moving fetch_with_lock out of StaleWhileRevalidate module
and into RailsCacheAdapter where it belongs.

Rationale:
- StaleWhileRevalidate module is specifically for SWR functionality
- fetch_with_lock is for non-SWR scenarios (when stale_ttl is 0)
- Only RailsCacheAdapter needs distributed stampede protection
- PromptCache uses in-memory locking, not suitable for fetch_with_lock

Changes:
- Moved fetch_with_lock from module to RailsCacheAdapter (public method)
- Moved wait_for_cache helper to RailsCacheAdapter
- Removed cache_set_simple (no longer needed, uses set() directly)
- Kept build_lock_key in module (used by schedule_refresh)
- Updated tests to reflect PromptCache doesn't support fetch_with_lock

Benefits:
- Cleaner separation of concerns
- StaleWhileRevalidate module reduced from 370 to 283 lines
- No rubocop disable needed for module length
- Clearer API: only RailsCacheAdapter provides distributed locking

All tests passing (786 examples, 97.01% coverage)
Simplifies cache API by deriving expiration values from ttl and stale_ttl
instead of requiring explicit expires_in parameter.

Rationale:
- Cache implementations already know their ttl and stale_ttl values
- No need for callers to calculate and pass expires_in
- Expiration logic is encapsulated within each cache implementation
- Cleaner, simpler API with fewer parameters

Changes:
- Removed expires_in parameter from PromptCache#set
- Removed expires_in parameter from RailsCacheAdapter#set
- Removed expires_in parameter from cache_set abstract method
- PromptCache now calculates: fresh_until = ttl, stale_until = ttl + stale_ttl
- RailsCacheAdapter calculates: expires_in = ttl + stale_ttl (if SWR) else ttl
- Updated all call sites and tests

Benefits:
- Simpler API - fewer parameters to think about
- Encapsulation - expiration logic lives in cache implementation
- Less error-prone - no risk of passing wrong expires_in value
- More maintainable - changing TTL logic only affects cache class

All tests passing (786 examples, 96.93% coverage)
Ensures cache thread pools are properly shutdown when client is disposed.

Rationale:
- Client#shutdown already calls score_client.shutdown for cleanup
- SWR-enabled caches use thread pools that need proper shutdown
- Thread pools should be gracefully terminated to prevent leaks
- Consistent cleanup pattern across all client components

Changes:
- Added ApiClient#shutdown method to shutdown cache if it supports it
- Client#shutdown now calls api_client.shutdown in addition to score_client
- Uses respond_to?(:shutdown) check for backward compatibility
- StaleWhileRevalidate#shutdown safely handles nil thread pools

Test Coverage:
- Added 5 new tests for ApiClient#shutdown (all scenarios)
- Added 3 new tests for Client#shutdown with cache scenarios
- Tests verify shutdown is called on SWR-enabled caches
- Tests verify shutdown is safe when cache is nil or doesn't support SWR
- All 794 examples passing, 96.86% coverage maintained

Benefits:
- Proper resource cleanup for long-running applications
- No thread pool leaks when client is shutdown
- Safe for all cache configurations (SWR, TTL-only, or no cache)
- Graceful degradation for caches without shutdown support
@drborges drborges force-pushed the feature/stale-while-revalidate branch from f325ea2 to dff1a15 Compare December 10, 2025 20:22
…or cache_stale_ttl

Improves API ergonomics by replacing Float::INFINITY with a more Ruby-idiomatic
:indefinite symbol for never-expiring cache configuration.

Rationale:
- :indefinite is more readable and Ruby-idiomatic than Float::INFINITY
- Symbols are the conventional way to represent special values in Ruby config
- Makes intent clearer in configuration code
- Aligns with Ruby community conventions (e.g., Rails cache :expires_in)

Changes:
- Renamed THOUSAND_YEARS_IN_SECONDS constant to INDEFINITE_SECONDS
- Updated constant comment to clarify intent ("indefinite cache duration")
- Changed normalize_stale_ttl from class method to instance method (normalized_stale_ttl)
- Instance method reads from cache_stale_ttl and returns normalized value on-demand
- Removed normalization from Config#initialize (now done lazily via method)
- Updated create_memory_cache to use config.normalized_stale_ttl
- Updated create_rails_cache_adapter to use config.normalized_stale_ttl
- Updated validate_swr_config! to handle :indefinite symbol properly
- Updated validation error message: "must be non-negative or :indefinite"

Documentation Updates:
- docs/CACHING.md: Changed Float::INFINITY to :indefinite in examples
- docs/CONFIGURATION.md: Changed Float::INFINITY to :indefinite in examples
- lib/langfuse/config.rb: Updated attr_accessor doc to mention :indefinite
- lib/langfuse/prompt_cache.rb: Updated comment to reflect :indefinite normalization

Test Coverage:
- Updated all tests to use :indefinite instead of Float::INFINITY
- Updated all tests to use INDEFINITE_SECONDS instead of THOUSAND_YEARS_IN_SECONDS
- Updated test expectations for new validation error message
- All 797 examples passing, 96.86% coverage maintained

Benefits:
- More intuitive API for developers (config.cache_stale_ttl = :indefinite)
- Clearer intent in configuration code
- Normalization happens on-demand via instance method
- Original config value preserved (not mutated during initialization)
- Better alignment with Ruby conventions and best practices
@drborges drborges force-pushed the feature/stale-while-revalidate branch from dff1a15 to f7e9fdc Compare December 10, 2025 20:24
Adds specific validation to catch when cache_stale_while_revalidate is enabled
but cache_stale_ttl is nil, providing a helpful error message with recommended
solutions.

Rationale:
- When SWR is enabled, cache_stale_ttl must have a value
- nil is invalid regardless of SWR status
- Users need clear guidance on how to fix the configuration
- Prevents confusing runtime errors from occurring later

Changes:
- Added validate_swr_stale_ttl! private method to handle stale_ttl validation
- Added validate_refresh_threads! private method to validate refresh threads
- Split validate_swr_config! into smaller, focused methods
- New error message guides users to set cache_ttl or :indefinite
- Maintains existing validation that cache_stale_ttl cannot be nil
- Removed apply_swr_defaults! method from Config class
- Removed apply_swr_defaults! call from Config#initialize
- Removed test for auto-defaulting behavior
- Validation now catches missing cache_stale_ttl with helpful message

Error Message:
"cache_stale_ttl cannot be nil when cache_stale_while_revalidate is enabled.
Set it to cache_ttl for a logical default, or use :indefinite for never-expiring cache."

Test Coverage:
- Added test for SWR enabled with nil cache_stale_ttl scenario
- All 802 examples passing with 96.87% coverage maintained
- RuboCop: No offenses detected

Benefits:
- Clear, actionable error messages for misconfiguration
- Helps users understand the relationship between SWR and stale_ttl
- Reduces cyclomatic complexity through method extraction
- Easier to maintain and test individual validation rules
- More explicit, less magical configuration
- Easier to understand what values are actually set
- No hidden side effects during initialization
@drborges
Copy link
Contributor Author

@kxzk I think I was able to address all your feedback. This is ready for another round of reviews. Thanks again.

@drborges
Copy link
Contributor Author

@kxzk, @NoahFisher let me know if this look good and what the next steps would be for us getting this merged to main.

@drborges drborges requested a review from kxzk December 11, 2025 19:42
Copy link
Member

@NoahFisher NoahFisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Considerations:

  1. Memory Leak Potential with In-Memory Locks

In PromptCache, locks are stored in a hash but only deleted on release_lock.

If a refresh thread crashes between acquire_lock and release_lock (despite the ensure block), the lock is never released. The ensure should handle this, but if the thread is killed externally (e.g., Thread#kill), the lock persists forever. Unlike Redis locks which have TTL expiration, these in-memory locks have no timeout.

Suggestion: Accept this as a known limitation of the in-memory backend, but document it.

  1. No Backpressure Signal to Callers

When the thread pool queue fills up and tasks are discarded, callers have no idea. This is fine for SWR (they get stale data), but it means:

  • Monitoring is blind to refresh failures
  • If the API is down for extended periods, stale data keeps being served with no signal

Question: Should there be a hook or callback for observability? e.g., on_refresh_dropped, on_refresh_failed?

  1. Cold Start Behavior

On application boot with an empty cache, the first request for each prompt will be a synchronous API call (cache miss). With SWR enabled, you've documented cache warming, which is good. But worth calling out:

SWR doesn't help cold starts - it only helps after the cache has been populated at least once.

  1. Ruby's GVL (Global VM Lock)

Worth noting that Ruby's GVL means these threads aren't truly parallel for CPU work. However, for I/O-bound work (HTTP calls to Langfuse API), threads do release the GVL during I/O waits, so this is fine. Just worth being aware that max_threads: 5 doesn't mean 5x throughput for CPU-bound work.

  1. Should SWR be Default-On for Rails Backend?

The PR makes SWR opt-in (cache_stale_while_revalidate = false by default). Given the latency benefits and that the distributed lock prevents thundering herd, is there an argument for making
it default-on for the :rails backend specifically? Users with Redis already have the
infrastructure for distributed locks.

Counter-argument: Surprising behavior changes are bad. Opt-in is safer.

Good work so far, I wouldn't consider the above blocking comments (I feel more strongly about the inline ones), but curious if we should document our thinking for next steps.

Comment on lines +43 to +45
| `cache_stale_while_revalidate` | Boolean | No | `false` | Enable stale-while-revalidate |
| `cache_stale_ttl` | Integer | No | `60` when SWR is enabled | Stale TTL (seconds) |
| `cache_refresh_threads` | Integer | No | `5` | Background refresh threads |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Configuration Complexity

Concern: This PR adds 3 new configuration options (cache_stale_while_revalidate, cache_stale_ttl, cache_refresh_threads). I'd push back on exposing all of these to users.

Principle: Every configuration option is a decision the user has to make. Good defaults mean fewer decisions.

Suggestion: Consider collapsing to a single option:

Instead of:

config.cache_stale_while_revalidate = true
config.cache_stale_ttl = 300
config.cache_refresh_threads = 5

Just:

config.cache_stale_while_revalidate = true  # Uses sensible defaults internally

Reasoning:

  • cache_stale_ttl defaulting to cache_ttl is already the right choice 99% of the time - why expose it?
  • cache_refresh_threads at 5 is fine for virtually all deployments - this is an implementation detail, not a user concern
  • Users who truly need to tune these can always subclass or we can add them later

The bar for adding config options should be: "Will a significant number of users need to change this from the default?" If not, hardcode it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We absolutely need to set the TTL to infinite. Not exposing the TTL setting won't work for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the use case for a non-infinite TTL?

Copy link
Contributor Author

@drborges drborges Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every configuration option is a decision the user has to make. Good defaults mean fewer decisions.

I agree, it certainly makes sense to hide cache_refresh_threads. Exposing cache_stale_ttl would be more of a flexibility in my opinion in case clients have very specific needs.

With that said, if I choose to enable SWR, I would more often than not want an infinite grace period, e.g. cache_stale_ttl = :indefinite for high availability reasons, then use cache_ttl to control how often cache is refreshed.

Based on that, only exposing config.cache_stale_while_revalidate would make sense to me if that defaults cache_stale_ttl to :indefinite as per the proposed implementation.

Does that make sense?

@kxzk kxzk added the enhancement New feature or request label Dec 12, 2025
@drborges
Copy link
Contributor Author

Thanks for all of the great feedback so far @NoahFisher! I've addressed some of the points you raised and wanted to follow up on a few of the points you brought up above:

Point 2

No Backpressure Signal to Callers
When the thread pool queue fills up and tasks are discarded, callers have no idea. This is fine for SWR (they get stale data), but it means:

Monitoring is blind to refresh failures
If the API is down for extended periods, stale data keeps being served with no signal
Question: Should there be a hook or callback for observability? e.g., on_refresh_dropped, on_refresh_failed?

This is a great point and I think it would be interesting to provide such hooks. Is the following somewhat aligned with what you had in mind?

Langfuse.configure do |config|
  # Hook called whenever a refresh attempt fails for whatever reason
  config.on_cache_refresh_failed do |error, cache_key|
    # handle refresh error accordingly... e.g. publish to Sentry, log something out, etc...
  end
  
  # Hook called whenever a refresh attempt is dropped due to not being able to acquire the lock
  config.on_cache_refresh_dropped do |cache_key|
    # handle refresh drops accordingly...
  end
end

Can you think of any other scenario we'd like to cover?

Also, would it make sense to ship that as another enhancement PR?

Point 3

With SWR enabled, you've documented cache warming, which is good.

Could you point me to that doc entry so I can double check? Cache warming should not be impacted by SWR as you pointed out, so that would likely be a mishap on my part.

Point 5

Should SWR be Default-On for Rails Backend?
The PR makes SWR opt-in (cache_stale_while_revalidate = false by default). Given the latency benefits and that the distributed lock prevents thundering herd, is there an argument for making
it default-on for the :rails backend specifically? Users with Redis already have the infrastructure for distributed locks.

Counter-argument: Surprising behavior changes are bad. Opt-in is safer.

My initial thinking was exactly that "Opt-in seemed safer" and intentional. Now, you do raise a good point, if 99% of the time Rails clients end up turning SWR on, then perhaps it should be the default? I could go either way on this one to be honest.

@drborges
Copy link
Contributor Author

drborges commented Dec 16, 2025

@NoahFisher, @kxzk do you think we may be able to cut a new release including the work in this PR at some point this week?

Also, one small snag I just ran into is that the project we plan on using langfuse-rb already has the faraday gem pinned to 1.0 due to some old dependency limitations we yet have to deal with.

Now the simplest solution to our problem seems to be having langfuse-rb depend on faraday >= 1.0 as it does not appear to require any specific feature from faraday 2.0. Would this be something we could consider for this PR or a follow-up?

@drborges
Copy link
Contributor Author

Moving the faraday convo to a new issue: #38

@drborges
Copy link
Contributor Author

@NoahFisher, @kxzk any thoughts on the last update? Do you think we could move on with the solution as is and address anything that is non-blocking in a follow-up PR perhaps?

@benlangfeld
Copy link

@kxzk @NoahFisher Happy new year. Could we please get this merged? It's been going quite a long time.

…ture/stale-while-revalidate

* 'main' of github.com:simplepractice/langfuse-rb:
  Allow a wider range of faraday versions to increase gem compatibility with older projects (simplepractice#37)
  docs(contributing): update file paths and pr template guidance
  chore(pr-template): standardize heading format for consistency
  chore: add pull request template
  docs(changelog): consolidate unreleased changes into v0.2.0 release
  chore(release): bump version to 0.2.0
  chore: release v0.2.0
  docs(changelog): restructure to reflect incremental releases
  docs(readme): improve visual hierarchy and readability
  docs(readme): improve contribution guidelines presentation
  docs: remove redundant license section from readme
  docs(readme): update gem badge and fix formatting
  Add create and update methods to client API (simplepractice#36)
@kxzk
Copy link
Collaborator

kxzk commented Jan 14, 2026

@NoahFisher if you want to do one final review.

I created one-off test scripts to validate against live Langfuse instance (all 13 tests pass).

Process

  • Create scripts/ directory

  • Place both of these scripts in scripts/ dir

  • You'll need to create test Langfuse account (can sign up via GitHub)

  • Copy env creds to .env (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST)

  • Create Prompts

set -a && source .env && set +a && uv run scripts/setup_swr_prompts.py myrun
  • Run Tests
set -a && source .env && set +a && bundle exec ruby scripts/validate_swr.rb myrun

Create Prompts

Python Script
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11,<3.14"
# dependencies = ["langfuse>=3.0.0"]
# ///
"""Create test prompts for SWR validation suite."""

import os
import sys
from langfuse import Langfuse

TEST_RUN_ID = sys.argv[1] if len(sys.argv) > 1 else "default"

PROMPT_SUFFIXES = [
    "fresh-test",
    "stale-test",
    "refresh-test",
    "expired-test",
    "dedup-test",
    "indefinite-test",
    "overflow-test",
    "shutdown-test",
    "concurrent-test",
    "evict-test",
    "isolation-test",
    "simple-test",
]

def main():
    langfuse = Langfuse()

    print(f"Creating test prompts for run: {TEST_RUN_ID}")

    for suffix in PROMPT_SUFFIXES:
        name = f"swr-{TEST_RUN_ID}-{suffix}"
        try:
            langfuse.create_prompt(
                name=name,
                type="text",
                prompt=f"Hello {{{{name}}}}! Prompt: {name}",
                labels=["production", "swr-test"],
            )
            print(f"  Created: {name}")
        except Exception as e:
            if "already exists" in str(e).lower():
                print(f"  Exists:  {name}")
            else:
                print(f"  Error:   {name} - {e}")

    langfuse.flush()
    print(f"\nDone. Created {len(PROMPT_SUFFIXES)} prompts.")
    print(f"Run ID: {TEST_RUN_ID}")

if __name__ == "__main__":
    main()

Run Tests

Ruby Script
#!/usr/bin/env ruby
# frozen_string_literal: true

require "bundler/setup"
require "langfuse"
require "concurrent"
require "securerandom"

class SWRValidator
  attr_reader :results, :test_prompts

  def initialize(test_run_id = nil)
    @results = []
    @test_prompts = []
    @test_run_id = test_run_id || SecureRandom.hex(4)
    @base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
    @public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
    @secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
  end

  def run_all
    puts "\n" + "=" * 70
    puts "STALE-WHILE-REVALIDATE VALIDATION SUITE (LIVE API)"
    puts "=" * 70
    puts "Test run ID: #{@test_run_id}"

    setup_test_prompts

    validate_fresh_state
    validate_stale_state_serves_immediately
    validate_stale_triggers_background_refresh
    validate_expired_state_blocks
    validate_refresh_deduplication
    validate_indefinite_stale_ttl
    validate_thread_pool_queue_overflow
    validate_graceful_shutdown
    validate_config_validation_errors
    validate_concurrent_access_safety
    validate_cache_eviction_under_swr
    validate_version_label_cache_key_isolation
    validate_swr_disabled_falls_back_to_simple_cache

    print_summary
  end

  private

  def setup_test_prompts
    @test_prompts = %w[
      fresh-test stale-test refresh-test expired-test dedup-test
      indefinite-test overflow-test shutdown-test concurrent-test
      evict-test isolation-test simple-test
    ].map { |n| "swr-#{@test_run_id}-#{n}" }

    puts "\nUsing #{@test_prompts.size} prompts with prefix: swr-#{@test_run_id}-*"
    puts "Verifying prompts exist..."

    verify_prompts_exist
  end

  def verify_prompts_exist
    Langfuse.reset!
    Langfuse.configure do |c|
      c.public_key = @public_key
      c.secret_key = @secret_key
      c.base_url = @base_url
    end

    client = Langfuse.client
    first_prompt = @test_prompts.first

    retries = 0
    loop do
      begin
        client.get_prompt(first_prompt)
        puts "Prompts verified."
        break
      rescue Langfuse::NotFoundError
        retries += 1
        if retries > 5
          puts "\nERROR: Prompts not found."
          puts "Create prompts first: uv run scripts/setup_swr_prompts.py #{@test_run_id}"
          exit 1
        end
        puts "Waiting for prompts... (#{retries}/5)"
        sleep 2
      end
    end

    client.shutdown
  end

  def prompt_name(suffix)
    "swr-#{@test_run_id}-#{suffix}"
  end

  def validate_fresh_state
    test("Fresh state serves immediately without API call") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 60
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 120
      end

      client = Langfuse.client
      name = prompt_name("fresh-test")

      start1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      client.get_prompt(name)
      first_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start1

      start2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      client.get_prompt(name)
      second_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start2

      client.shutdown

      assert(second_elapsed < first_elapsed / 2, "Cached call should be much faster (#{ms(second_elapsed)}ms vs #{ms(first_elapsed)}ms)")
      assert(second_elapsed < 0.01, "Cached call should be < 10ms, got #{ms(second_elapsed)}ms")
    end
  end

  def validate_stale_state_serves_immediately
    test("Stale state returns data immediately (not blocked by refresh)") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.1
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
      end

      client = Langfuse.client
      name = prompt_name("stale-test")

      first_result = client.get_prompt(name)
      sleep 0.15

      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      result = client.get_prompt(name)
      elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start

      client.shutdown

      assert(result, "Should return cached prompt")
      assert(result.prompt == first_result.prompt, "Should return same prompt content")
      assert(elapsed < 0.05, "Stale response should be < 50ms, got #{ms(elapsed)}ms")
    end
  end

  def validate_stale_triggers_background_refresh
    test("Stale state triggers background refresh that updates cache") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.1
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
      end

      client = Langfuse.client
      cache = client.instance_variable_get(:@api_client).instance_variable_get(:@cache)
      name = prompt_name("refresh-test")
      cache_key = Langfuse::PromptCache.build_key(name)

      client.get_prompt(name)
      sleep 0.15

      internal_cache = cache.instance_variable_get(:@cache)
      entry_before = internal_cache[cache_key]
      fresh_until_before = entry_before&.fresh_until

      client.get_prompt(name)
      sleep 0.5

      entry_after = internal_cache[cache_key]
      fresh_until_after = entry_after&.fresh_until

      client.shutdown

      assert(fresh_until_before, "Should have initial cache entry")
      assert(fresh_until_after, "Should have updated cache entry")
      assert(fresh_until_after > fresh_until_before, "Background refresh should update fresh_until")
    end
  end

  def validate_expired_state_blocks
    test("Expired state blocks until fresh data fetched") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.05
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 0.05
      end

      client = Langfuse.client
      name = prompt_name("expired-test")

      client.get_prompt(name)
      sleep 0.15

      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      result = client.get_prompt(name)
      elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start

      client.shutdown

      assert(result, "Should return prompt after blocking fetch")
      assert(elapsed >= 0.05, "Should block for API call, got #{ms(elapsed)}ms")
    end
  end

  def validate_refresh_deduplication
    test("Multiple stale requests all return quickly (deduplication working)") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.05
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
      end

      client = Langfuse.client
      name = prompt_name("dedup-test")

      client.get_prompt(name)
      sleep 0.1

      response_times = Concurrent::Array.new
      threads = 5.times.map do
        Thread.new do
          start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
          client.get_prompt(name)
          response_times << (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start)
        end
      end
      threads.each(&:join)

      client.shutdown

      max_response = response_times.max
      assert(max_response < 0.2, "All stale requests should return quickly (<200ms), slowest was #{ms(max_response)}ms")
    end
  end

  def validate_indefinite_stale_ttl
    test(":indefinite stale_ttl serves stale data without expiring") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.01
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = :indefinite
      end

      stale_ttl = Langfuse.configuration.normalized_stale_ttl
      assert(stale_ttl >= 31_536_000_000, ":indefinite should normalize to ~1000 years")

      client = Langfuse.client
      name = prompt_name("indefinite-test")

      client.get_prompt(name)
      sleep 0.05

      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      result = client.get_prompt(name)
      elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start

      client.shutdown

      assert(result, "Should return cached prompt")
      assert(elapsed < 0.02, "Should serve stale immediately, got #{ms(elapsed)}ms")
    end
  end

  def validate_thread_pool_queue_overflow
    test("Thread pool handles queue overflow gracefully") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.01
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
        c.cache_refresh_threads = 1
      end

      client = Langfuse.client
      cache = client.instance_variable_get(:@api_client).instance_variable_get(:@cache)
      name = prompt_name("overflow-test")

      client.get_prompt(name)
      sleep 0.02

      error_raised = false
      begin
        30.times do
          cache.fetch_with_stale_while_revalidate("overflow-key-#{SecureRandom.hex(4)}") do
            sleep 0.1
            { data: "test" }
          end
        end
      rescue StandardError => e
        error_raised = true
        puts "    Unexpected error: #{e.message}"
      end

      client.shutdown

      assert(!error_raised, "Should not raise error on queue overflow")
    end
  end

  def validate_graceful_shutdown
    test("Shutdown completes gracefully with pending refreshes") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.01
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
      end

      client = Langfuse.client
      name = prompt_name("shutdown-test")

      client.get_prompt(name)
      sleep 0.02
      client.get_prompt(name)

      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      client.shutdown
      elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start

      assert(elapsed < 6, "Shutdown should complete within timeout (got #{ms(elapsed)}ms)")
    end
  end

  def validate_config_validation_errors
    test("Configuration validation catches invalid SWR configs") do
      errors_caught = 0

      begin
        Langfuse.reset!
        Langfuse.configure do |c|
          c.public_key = "pk-test"
          c.secret_key = "sk-test"
          c.cache_stale_while_revalidate = true
          c.cache_stale_ttl = nil
        end
        Langfuse.configuration.validate!
      rescue Langfuse::ConfigurationError => e
        errors_caught += 1 if e.message.include?("cache_stale_ttl")
      end

      begin
        Langfuse.reset!
        Langfuse.configure do |c|
          c.public_key = "pk-test"
          c.secret_key = "sk-test"
          c.cache_stale_while_revalidate = true
          c.cache_stale_ttl = -5
        end
        Langfuse.configuration.validate!
      rescue Langfuse::ConfigurationError
        errors_caught += 1
      end

      begin
        Langfuse.reset!
        Langfuse.configure do |c|
          c.public_key = "pk-test"
          c.secret_key = "sk-test"
          c.cache_refresh_threads = 0
        end
        Langfuse.configuration.validate!
      rescue Langfuse::ConfigurationError
        errors_caught += 1
      end

      Langfuse.reset!

      assert(errors_caught == 3, "Should catch 3 validation errors, got #{errors_caught}")
    end
  end

  def validate_concurrent_access_safety
    test("Concurrent access is thread-safe") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 0.05
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 60
        c.cache_max_size = 100
      end

      client = Langfuse.client
      name = prompt_name("concurrent-test")
      errors = Concurrent::AtomicFixnum.new(0)
      success = Concurrent::AtomicFixnum.new(0)

      threads = 10.times.map do
        Thread.new do
          5.times do
            begin
              result = client.get_prompt(name)
              result ? success.increment : errors.increment
            rescue StandardError => e
              errors.increment
              puts "    Thread error: #{e.message}"
            end
          end
        end
      end

      threads.each(&:join)
      client.shutdown

      total = success.value + errors.value
      assert(errors.value == 0, "Should have 0 errors, got #{errors.value}/#{total}")
    end
  end

  def validate_cache_eviction_under_swr
    test("Cache eviction works correctly under SWR") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 60
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 120
        c.cache_max_size = 3
      end

      client = Langfuse.client
      cache = client.instance_variable_get(:@api_client).instance_variable_get(:@cache)

      cache.set("evict-1", { data: 1 })
      sleep 0.01
      cache.set("evict-2", { data: 2 })
      sleep 0.01
      cache.set("evict-3", { data: 3 })

      cache.get("evict-1")
      cache.set("evict-4", { data: 4 })

      cache_size = cache.size

      client.shutdown

      assert(cache_size <= 3, "Cache should respect max_size (got #{cache_size})")
    end
  end

  def validate_version_label_cache_key_isolation
    test("Version and label create isolated cache keys") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 60
        c.cache_stale_while_revalidate = true
        c.cache_stale_ttl = 120
      end

      client = Langfuse.client
      cache = client.instance_variable_get(:@api_client).instance_variable_get(:@cache)
      name = prompt_name("isolation-test")

      client.get_prompt(name)
      client.get_prompt(name, version: 1)
      client.get_prompt(name, label: "swr-test")

      internal_cache = cache.instance_variable_get(:@cache)

      latest_key = Langfuse::PromptCache.build_key(name)
      version_key = Langfuse::PromptCache.build_key(name, version: 1)
      label_key = Langfuse::PromptCache.build_key(name, label: "swr-test")

      has_latest = internal_cache.key?(latest_key)
      has_version = internal_cache.key?(version_key)
      has_label = internal_cache.key?(label_key)

      error_raised = false
      begin
        client.get_prompt(name, version: 1, label: "swr-test")
      rescue ArgumentError => e
        error_raised = e.message.include?("Cannot specify both")
      end

      client.shutdown

      assert(has_latest, "Should have cache key for latest (#{latest_key})")
      assert(has_version, "Should have cache key for version (#{version_key})")
      assert(has_label, "Should have cache key for label (#{label_key})")
      assert(error_raised, "Should raise ArgumentError when both version and label specified")
    end
  end

  def validate_swr_disabled_falls_back_to_simple_cache
    test("SWR disabled uses simple cache (no thread pool)") do
      Langfuse.reset!
      Langfuse.configure do |c|
        c.public_key = ENV.fetch("LANGFUSE_PUBLIC_KEY")
        c.secret_key = ENV.fetch("LANGFUSE_SECRET_KEY")
        c.base_url = ENV.fetch("LANGFUSE_HOST", "https://cloud.langfuse.com")
        c.cache_backend = :memory
        c.cache_ttl = 60
        c.cache_stale_while_revalidate = false
      end

      client = Langfuse.client
      cache = client.instance_variable_get(:@api_client).instance_variable_get(:@cache)
      name = prompt_name("simple-test")

      has_thread_pool = cache.instance_variable_get(:@thread_pool)&.running?

      start1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      client.get_prompt(name)
      first_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start1

      start2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      client.get_prompt(name)
      second_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start2

      client.shutdown

      assert(!has_thread_pool, "Thread pool should not be running when SWR disabled")
      assert(second_elapsed < first_elapsed / 2, "Simple cache should still work")
    end
  end

  def ms(seconds)
    (seconds * 1000).round(1)
  end

  def test(name)
    print "\n#{name}... "
    begin
      yield
      @results << { name: name, passed: true }
      puts "✓ PASS"
    rescue AssertionError => e
      @results << { name: name, passed: false, error: e.message }
      puts "✗ FAIL: #{e.message}"
    rescue StandardError => e
      @results << { name: name, passed: false, error: "#{e.class}: #{e.message}" }
      puts "✗ ERROR: #{e.class}: #{e.message}"
      puts e.backtrace.first(3).map { |l| "    #{l}" }.join("\n")
    end
  end

  def assert(condition, message = "Assertion failed")
    raise AssertionError, message unless condition
  end

  def print_summary
    puts "\n" + "=" * 70
    puts "SUMMARY"
    puts "=" * 70

    passed = @results.count { |r| r[:passed] }
    failed = @results.count { |r| !r[:passed] }

    puts "Passed: #{passed}"
    puts "Failed: #{failed}"

    if failed > 0
      puts "\nFailures:"
      @results.reject { |r| r[:passed] }.each do |r|
        puts "  - #{r[:name]}: #{r[:error]}"
      end
    end

    puts "\n" + (failed.zero? ? "ALL TESTS PASSED" : "SOME TESTS FAILED")
    puts "=" * 70 + "\n"

    exit(failed.zero? ? 0 : 1)
  end

  class AssertionError < StandardError; end
end

test_run_id = ARGV[0]
if test_run_id.nil? || test_run_id.empty?
  puts "Usage: ruby scripts/validate_swr.rb <test_run_id>"
  puts "First create prompts: uv run scripts/setup_swr_prompts.py <test_run_id>"
  exit 1
end

SWRValidator.new(test_run_id).run_all

@drborges
Copy link
Contributor Author

@NoahFisher any ETA on when we might expect this to be merged/released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants