A Proposal for Context Engineering for Quarkus Langchain4j #1979
Replies: 32 comments 89 replies
-
|
On the implementation side, I am not yet sure how |
Beta Was this translation helpful? Give feedback.
-
|
So first, if you start having things like: You're doing recursive templates (template in a template string expression) and that's horrible: don't go there, don't do that. If you want to pass a variable value to a template, use the proper template syntax: |
Beta Was this translation helpful? Give feedback.
-
|
Second: your One option you're listing here is via subtypes of public interface ContextFragmentProvider<T extends QuerySpec> {
// This is super weird: you take a QuerySpec and you return a subtype of it? might as well build the proper subtype from the start, no?
T buildQuerySpec(QuerySpec qs);
// This is a bit annoying because it forces us to separate the list of parameters from the method which is going to use them. It can be useful, but it can also be isolating
Uni<List<ContextFragment>> read(T spec, ContextBudget budget);Using This seems like a problem we've already solved in Quarkus in many places: // I'm not even sure the interface is required anymore
public class FaqRagProvider implements ContextFragmentProvider {
// this defines a fragment provider and all its parameters: names, types, default values, required or not
@FragmentProvider(/* not sure what this does but I've seen it in the spec*/ required = true
// the name could be a parameter here too, instead of a method on ContextFragmentProvider
)
Uni<List<ContextFragment>> read(String query, /* primitive implies required perhaps */ float minScore,
@Required Integer maxResults,
@DefaultsTo("true") boolean optionalParam,
List<CategoryEnum> filter,
ContextBudget budget) {
// Now we have all the useful parameter values we can use
}
}For your: Here I'm talking about the user-facing API. Of course, we will need to extract all this annotation/signature info into an API for implementation consumption. Now, if you do keep the |
Beta Was this translation helpful? Give feedback.
-
|
As to the Qute aspects: @RegisterAiService
public interface CustomerSupportBot {
@SystemMessage("""
You are a customer support assistant for Acme Corp.
{#context slot="policies"
provider="static-policies"
maxTokens=500
category="customer-service"
required=true /}
{#context slot="history"
provider="chat-memory"
maxTokens=2000
conversationId
maxMessages=20 /}
{#context slot="knowledge"
provider="faq-rag"
maxTokens=1000
query=userMessage
minScore=0.7 /}
{#context slot="user-profile"
provider="user-preferences"
maxTokens=300
userId /}
## Company Policies
{context:policies}
## Conversation History
{context:history}
## Relevant Knowledge Base Articles
{context:knowledge}
## User Preferences
{context:user-profile}
Now respond to the user's question.
""")
String chat(@UserMessage String userMessage,
@MemoryId String conversationId,
String userId);
}NOTE: in Qute we can simplify Do we expect Is there any value in separating the context declaration from its insertion in the template? This looks awfully like we're defining variables (what you call "slots" for a reason I cannot immediatly understand) and only using them once. If we can use them more than once, then fine. If not, we can simplify: @RegisterAiService
public interface CustomerSupportBot {
@SystemMessage("""
You are a customer support assistant for Acme Corp.
## Company Policies
{#context
provider="static-policies"
maxTokens=500
category="customer-service"
required=true /}
## Conversation History
{#context
provider="chat-memory"
maxTokens=2000
conversationId
maxMessages=20 /}
## Relevant Knowledge Base Articles
{#context
provider="faq-rag"
maxTokens=1000
query=userMessage
minScore=0.7 /}
## User Preferences
{#context
provider="user-preferences"
maxTokens=300
userId /}
Now respond to the user's question.
""")
String chat(@UserMessage String userMessage,
@MemoryId String conversationId,
String userId);
}If we absolutely must define new variables because we can reuse them, or we may want to call functions on them or what, then let's be clear about them being variables: @RegisterAiService
public interface CustomerSupportBot {
@SystemMessage("""
You are a customer support assistant for Acme Corp.
{#define name="policies"
provider="static-policies"
maxTokens=500
category="customer-service"
required=true /}
{#define name="history"
provider="chat-memory"
maxTokens=2000
conversationId
maxMessages=20 /}
{#define name="knowledge"
provider="faq-rag"
maxTokens=1000
query=userMessage
minScore=0.7 /}
{#define name="userProfile"
provider="user-preferences"
maxTokens=300
userId /}
## Company Policies
{policies.toLowerCase}
## Conversation History
{history.or("No history")}
## Relevant Knowledge Base Articles
{knowledge}
## User Preferences
{userProfile}
Now respond to the user's question.
""")
String chat(@UserMessage String userMessage,
@MemoryId String conversationId,
String userId);
}And finally, given that we know statically all the template fragment providers we could just as well auto-declare their Qute tags and write: @RegisterAiService
public interface CustomerSupportBot {
@SystemMessage("""
You are a customer support assistant for Acme Corp.
{#define-static-policies name="policies"
maxTokens=500
category="customer-service"
required=true /}
{#define-chat-memory name="history"
maxTokens=2000
conversationId
maxMessages=20 /}
{#define-faq-rag name="knowledge"
maxTokens=1000
query=userMessage
minScore=0.7 /}
{#define-user-preferences name="userProfile"
maxTokens=300
userId /}
## Company Policies
{policies.toLowerCase}
## Conversation History
{history.or("No history")}
## Relevant Knowledge Base Articles
{knowledge}
## User Preferences
{userProfile}
Now respond to the user's question.
""")
String chat(@UserMessage String userMessage,
@MemoryId String conversationId,
String userId);
}But again, if all your fragment providers produce @RegisterAiService
public interface CustomerSupportBot {
@SystemMessage("""
You are a customer support assistant for Acme Corp.
## Company Policies
{#context-static-policies
maxTokens=500
category="customer-service"
required=true /}
## Conversation History
{#context-chat-memory
maxTokens=2000
conversationId
maxMessages=20 /}
## Relevant Knowledge Base Articles
{#context-faq-rag
maxTokens=1000
query=userMessage
minScore=0.7 /}
## User Preferences
{#context-user-preferences
maxTokens=300
userId /}
Now respond to the user's question.
""")
String chat(@UserMessage String userMessage,
@MemoryId String conversationId,
String userId);
} |
Beta Was this translation helpful? Give feedback.
-
|
Your input is very much appreciated here @FroMage! I will take a close look on Monday |
Beta Was this translation helpful? Give feedback.
-
|
I like the idea of aggregation of various sources and of the budget control. I'm a bit skeptical that prompt is the control center though. Feels like it's a uber things that could have multiple prompts as input. |
Beta Was this translation helpful? Give feedback.
-
|
You mention auditability but that means logging not only fragment composition but all fragments and the end prompt, is that was you had in mind? |
Beta Was this translation helpful? Give feedback.
-
|
Using the system message for all that “seems wrong” as it giving priorities to all contexts equally. I find it strange to set all the things in SystemMessage and have a UserMessage seemingly disconnected below. Might you want to embed the user message in the final message (e.g. in the middle). |
Beta Was this translation helpful? Give feedback.
-
|
Could it be that one might want to compress a composed prompt? How would you do that? |
Beta Was this translation helpful? Give feedback.
-
|
Will prompts be externalised by some teams and thus be “untypesafe”? |
Beta Was this translation helpful? Give feedback.
-
|
maxToken should be in a capacity to ask for compression (wrapped ContextFragment)? |
Beta Was this translation helpful? Give feedback.
-
|
How are attributes of context fragment used to alter behavior? In qute code? |
Beta Was this translation helpful? Give feedback.
-
|
I don’t really understand the UpdateSpec behavior, when it is called? |
Beta Was this translation helpful? Give feedback.
-
|
Is the prompt the root of all things in a given request / operation? I doubt it. |
Beta Was this translation helpful? Give feedback.
-
|
Will the prompt composer be the same/similar and duplicated in a lot of places? |
Beta Was this translation helpful? Give feedback.
-
|
I am trying to compare and reconcile this with the current RetrievalAugmentor stuff that we have right now. Because there are many similarities and I'm thinking whether we should create something completely new and separate, or build something on top of existing things.
I could envision using this template-based approach that would, at build time, automatically transform the template into an instance of
That way, we could offer the typesafe templating as proposed by Clement, but build on top of existing components instead of creating something completely new. |
Beta Was this translation helpful? Give feedback.
-
|
As for the multi-modal payloads, I'm not sure how this fits into the templating approach, if you have some non-text content, it has to be submitted to the LLM as a separate |
Beta Was this translation helpful? Give feedback.
-
|
Nice! I especially like the idea of explicitly specifying "slots" and controlling token budgets. |
Beta Was this translation helpful? Give feedback.
-
|
Tools and structured outputs are missing in this proposal, but they are core LLM features and are included in the "final" prompt that LLM eventually "sees" before generating the answer. Token budgets, priorities, ordering, observability, etc are also applicable to them. |
Beta Was this translation helpful? Give feedback.
-
|
IIUC, this solution seems to assume that only 2 messages are always sent in the request to the LLM: Not sure how tools will work here as well. Many LLM providers are expecting to have a specific ordering of specific types of messages, for example: |
Beta Was this translation helpful? Give feedback.
-
|
Some context providers might depend on the others, for example RAG provider might need to know the whole converational history and user preferences in order to provide the most relevant pieces of information. |
Beta Was this translation helpful? Give feedback.
-
|
We need to make sure to not put questionable content in the system message (also in examples) to reduce chances of prompt injections. Especially not converstanion history (that contains user queries), maybe not even RAG. |
Beta Was this translation helpful? Give feedback.
-
|
I would also include "efficiently taking advantage of prompt caching" in the list of requirements, as it is an important LLM feature and can dramatically impact latency and cost |
Beta Was this translation helpful? Give feedback.
-
|
One thing that I just realized could be a nice thing about this work: @angelozerr is working on integrating the Qute debugger into Quarkus LangChain4j 😉 |
Beta Was this translation helpful? Give feedback.
-
|
I'm preparing a revamped version of the proposal. I hope to be able to publish it tomorrow. |
Beta Was this translation helpful? Give feedback.
-
|
I've posted a new version of the proposal. Hard to keep everything in sync, so please review. The main difference is the switch to a multi-message approach: Previous Approach (Single-Message)
New Approach (Multi-Message)
|
Beta Was this translation helpful? Give feedback.
-
|
I'm still seeing confusion as to the Qute syntax, in all the points I raised in the proposal. I also still don't see any reason to separate the As for the caching, I am not seeing any reason why the caching should be defined in the |
Beta Was this translation helpful? Give feedback.
-
|
When applying multiple annotations of the same type ( |
Beta Was this translation helpful? Give feedback.
-
|
Closing in favor of #2060. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context Engineering Proposal
Version: 0.2
Executive Summary
This issue describes a proposal architecture for context engineering in Quarkus Langchain4j: a system for managing AI context in a composable, deterministic, and observable manner.
Core Vision
Context engineering is the practice of identifying and providing the most relevant information from the surrounding system, data, and interaction history to an LLM, so that inference is more accurate, reliable, and aligned with user intent.
This removes hidden magic, makes context assembly explicit and auditable, and places complete control in developers’ hands.
Characteristics
What Problems Does This Solve?
Introduction & Motivation
The Context Engineering Challenge
Modern AI applications need to combine context from multiple sources:
Existing langchain4j features (ChatMemory, ContentRetriever) handle individual sources well but lack a unified model for:
Three Mental Models for Context Engineering
Before diving into the solution, it's essential to understand the different perspectives on how context relates to prompts:
Model A: Context is Part of the Prompt
Mental Model: The prompt template is the container; context providers inject dynamic data into it.
Developer Perspective: "I'm writing a prompt, and I need to pull in some contextual data”
Structure:
Example:
Characteristics:
Model B: Prompt is Part of the Context
Mental Model: Everything sent to the LLM is "context" (instructions, data, history, etc.)
Developer Perspective: "Context engineering encompasses everything the LLM sees.”
Structure:
Example:
Characteristics:
Model C: Separate but Complementary
Mental Model: Prompt = static structure/instructions; Context = dynamic data
Developer Perspective: "Prompts define the task; context provides the information"
Clear separation between what the developer writes versus what's retrieved.
Example:
Characteristics:
In this proposal, we would use the model A with a Multi-Message Extension
This proposal adopts Model A (context is part of the prompt) because it provides:
However, we extend it with multi-message support to enable:
Model B could be built as a higher-level API on top of this foundation in the future.
Why Multi-Message Prompt-First? (And Yes, I’m bad at naming things)
Traditional approaches hide context assembly in code or configuration:
Multi-message prompt-first design makes everything explicit:
Benefits:
Design Philosophy
Core Principles
1. Multi-Message Prompt-First Design
Principle: Message templates are the control plane for context engineering.
2. Determinism & Reproducibility
Principle: Same inputs → same context → same prompt (modulo time-based queries).
3. Composability
Principle: Context sources are independent, orthogonal building blocks.
4. Observability
Principle: Full visibility into what context was used and why.
5. Type Safety & Validation
(That's the hard one)
Principle: Provider parameters validated at build time when possible.
Architectural Overview
High-Level Architecture
flowchart TD A["Prompt Template (Qute)<br/>{#context slot='X' provider='Y' maxTokens=N query='...' /}"] B["Context Resolution Engine<br/>• Parse {#context} tags<br/>• Create QuerySpec from tag parameters<br/>• Validate parameters against provider schema<br/>• Route to appropriate ContextFragmentProvider"] C["ContextFragmentProvider (CDI Beans)"] C1["Chat Memory<br/>Provider"] C2["RAG<br/>Provider"] C3["User Prefs<br/>Provider"] C4["..."] D["ContextFragment[]<br/>• id, type, format, payload<br/>• attributes (score, metadata)<br/>• source (provenance)"] E["Budget Enforcement<br/>• Count tokens in fragments<br/>• Apply maxTokens limit<br/>• Truncate or reject overflow<br/>• Log budget decisions"] F["Rendered Context String<br/>• Injected into template as {context:slotName}<br/>• Final prompt sent to LLM"] A --> B B --> C C --> C1 C --> C2 C --> C3 C --> C4 C1 --> D C2 --> D C3 --> D C4 --> D D --> E E --> FComponent Responsibilities
ContextProvider?)Core Abstractions (incomplete list)
ContextFragment
A ContextFragment is the fundamental unit of context—an immutable, self-describing piece of information with provenance.
ContextFragmentProvider / ContextProvider
A ContextFragmentProvider/ContextProvider is a pluggable source/sink of context fragments, discovered via CDI.
QuerySpec
A QuerySpec encapsulates query parameters extracted from template tags.
Note that each context provider will provide an implementation of
QuerySpec(and a factory method).Creation from Template Tag:
Becomes:
This default query spec will be passed to the
buildQuerySpecof the provider to validate and create a customQuerySpecobject.This mechanism should allow build-time checking of the attributes.
ContextBudget
A ContextBudget represents resource limits for context operations.
Budget Creation from Template:
Creates:
ContextBudget.ofTokens(2000)Budget Enforcement:
The context provider enforces budgets by:
Source
A Source captures provenance metadata for fragments.
Multi-Message Prompt-First Design
The Central Principle
Message templates are the control plane for context engineering.
Instead of configuring context assembly in code or YAML files, developers declare their context requirements directly in message templates using Qute custom sections. Multiple
@SystemMessageand@UserMessageannotations define the structure of the message sequence sent to the LLM.Qute Integration: The
{#context}TagThe
{#context}tag is the primary mechanism for declaring context requirements.Syntax:
Key Attributes:
slot: Unique identifier for this context within the templateprovider: Name of the ContextFragmentProvider to querymaxTokens: Maximum tokens this context can consumecacheable: (Optional) Hint that this context is static and can be cached by the LLMrequired: (Optional) Whether this context must be present (see semantics below)Template Variable Interpolation:
Tag parameters can reference template variables:
Complete Example
How this works:
@ConversationMemoryannotationThe conversion history can be empty. It will not contain the Second message. If empty, the second message and current turn are merged (appended).
Observability with Prompt-First Design
Every
{#context}tag invocation is observable:Evolution from Single-Message Design
This proposal evolved from an initial single-message approach to the current multi-message model.
Original Approach: Single Message with Embedded History
Initial Concept:
Problems Identified:
Conversation History as Text: Chat history rendered as text rather than proper message structures
No Caching Strategy: Everything in one message means no clear separation between static and dynamic content
Poor Context Positioning: All context in one blob
Single-Turn Bias: Design worked for simple queries but broke down for conversations
Why Multi-Message is Better
The multi-message approach addresses all these issues:
Benefits:
@ConversationMemoryLLM Prompt Caching Strategy
Modern LLM providers (Anthropic, OpenAI) offer prompt caching to reduce costs and latency by caching portions of the prompt that don't change between requests.
How LLM Prompt Caching Works
Caching is prefix-based: The LLM caches everything up to a cache breakpoint.
Example with Anthropic:
Cache hits:
Cost savings:
Designing for Caching
Principle: Structure messages so static content precedes dynamic content.
Bad (no caching benefit):
Good (caching optimized):
The
cacheableAttributeUsage: Mark contexts that are static across requests.
Semantics:
cacheable=true: Hint to framework that this context is static and should be included in cached prefix (when it can be controlled programmatically)Note: This is a future optimization. Initial implementation may not leverage caching, but the API is designed to support it.
Context Ordering and Positioning
The order and position of context in the message sequence significantly affects LLM behavior.
Known LLM Biases
1. Primacy Bias: LLMs pay more attention to content at the beginning
2. Recency Bias: LLMs pay more attention to content at the end
3. Lost in the Middle: LLMs pay less attention to content in the middle of long contexts
Recommended Message Structure
Attention Levels:
Mitigating "Lost in the Middle"
Strategy 1: Keep history concise
Strategy 2: Position critical info at boundaries
Strategy 3: Semantic filtering (future)
Context Rotting
Problem: Old information in cached messages becomes stale.
Example:
If cached, stock prices from the first call persist for the whole interaction.
Solutions:
{#context slot="stock-prices" provider="prices" cacheable=false /}{#context slot="policies" provider="policies" cacheable=true /} ← Good (policies don't change){#context slot="user-prefs" provider="prefs" cacheable=true cacheTTL="1h" /}Guidelines:
cacheable=trueConversation Memory: The
@ConversationMemoryAnnotationThe
@ConversationMemoryannotation controls how conversation history is injected into the message sequence.Basic Usage
How It Works
Placement: The
@ConversationMemoryannotation marks where conversation history should be injected in the message sequence.Turn 1:
Turn 2:
Turn 3:
Configuration Options
Parameters:
maxTokens: Maximum token budget for conversation historymaxTurns: Maximum number of turns to includestrategy(future): How to select which turns to include"recent": Most recent turns (default)"semantic-similarity": Most relevant to current query (future)"importance": Based on turn importance scoring (future)Interaction with ChatMemory
The
@ConversationMemoryannotation works with the existingChatMemoryStore:ChatMemoryStore@ConversationMemoryretrieves relevant turns from the storeBudget Management
The framework enforces the token budget by:
ChatMemoryStore(up tomaxTurns)maxTokensexceededExample:
Included: Turns 2, 3, 4 (1700 tokens total)
Dropped: Turn 1 (would exceed 2000)
Positioning Guidelines
Where to place
@ConversationMemory?Option 1: Before current turn (recommended)
Benefits:
Option 2: After system message
Benefits:
Context Providers
This section describes the four context providers that can be used to better understand the configuration mechanism.
### 1. Episodic Long-Term Memory Provider
Purpose: Provides access to timestamped event history (what happened when).
Name:
episodic-memoryBehavior:
Configuration in Template:
Event Structure:
Events are stored in a structured format:
{ "id": "evt-12345", "userId": "user-789", "eventType": "task-completion", "timestamp": "2025-11-28T10:30:00Z", "data": { "taskId": "task-456", "outcome": "success", "duration": "5m" } }Read Behavior:
Fragment Format:
Or as formatted text:
2. User Preferences Provider (Long-Term Memory)
Purpose: Stores and retrieves user-specific preferences, settings, and profile data.
Name:
user-preferencesBehavior:
Configuration in Template:
Preference Structure:
Preferences stored as key-value pairs:
{ "userId": "user-789", "preferences": { "language": "en-US", "timezone": "America/New_York", "notification-settings": { "email": true, "sms": false }, "theme": "dark" } }Read Behavior:
Fragment Format (text):
Write Behavior:
3. RAG Provider (Document Retrieval)
Purpose: Retrieves relevant documents from a vector store based on semantic similarity.
Name:
rag(or domain-specific names likefaq-rag,docs-rag)Behavior:
Configuration in Template:
Read Behavior:
Fragment Format:
Type Safety & Validation
The Challenge
Template tag parameters are inherently stringly-typed:
How do we:
Proposal: Type-Safe QuerySpec Subclasses
Instead of using a generic schema-based validation approach, each provider defines its own QuerySpec subclass with strongly-typed parameters.
This provides compile-time type safety for provider implementations while maintaining flexibility for template-based configuration.
// TODO It's still unclear how this will allow build-time validation. We need to extract some sort of schema.
Advanced Topics and Implementation Details
Special Context Providers
Toolbox Provider
Purpose: Provides the list of available tools/functions for the LLM to call.
Name:
toolboxUsage:
How it works:
@Toolannotated methods in the applicationFragment Format:
Integration with Function Calling:
{context:toolbox}Dynamic Tool Selection (future):
{#context slot="toolbox" provider="tools" category="weather" cacheable=false /}Only include tools matching certain criteria.
Structured Output Provider
Purpose: Provides the JSON schema for structured output responses.
Name:
structured-outputUsage:
How it works:
Person.class)Fragment Format:
Build-time Generation:
The
requiredAttribute SemanticsThe
requiredattribute controls what happens when a context provider returns empty or no results.Syntax:
Semantics:
required=true(fail-fast)Behavior:
Example:
{#context slot="policies" provider="compliance-policies" required=true /}If compliance policies can't be loaded:
RequiredContextMissingExceptionUse cases:
CDI Scopes for Context Providers
Context providers are CDI beans, and their scope affects lifecycle and state management.
Recommended Scopes
@ApplicationScoped(default recommendation)Use for:
Benefits:
@RequestScopedUse for:
@RequestScopedbeansScope Considerations
Thread Safety:
@ApplicationScopedproviders must be thread-safeCaching:
@ApplicationScopedproviders can maintain application-wide caches@RequestScopedproviders can cache within a request@CacheResultfor method-level cachingProvider Dependencies and Injection
Providers can depend on each other and inject services via CDI.
Injecting Other Providers
Use cases:
Verifying Context Provider Inclusion
How do you verify that a context provider was actually executed and included in the prompt?
Observability Events
Every context provider invocation emits telemetry:
Testing with Mocks
Unit test with mock provider:
Rendered Prompt Inspection (Dev Mode)
Dev Mode Feature (future): Render and display the final prompt before sending to LLM.
Output:
Passing Fragments as Method Parameters
Use case: Manually retrieve context fragments and pass them to the AI service method.
Why: Advanced scenarios where you want full control over fragment retrieval.
Example
Future Directions
Composer & Operator Pipeline
Idea: Allow cross-provider composition and transformation of fragments.
Features:
Example (Future):
The composer would:
Advanced Budget Management
Dynamic Allocation:
Quality-Based Allocation:
Caching & Performance Optimization
Fragment Caching:
Embedding Caching:
Multi-Modal Context
Idea: Support non-text context (images, audio, video).
Example:
Challenges:
Provider Stereotypes
Idea: Define reusable "stereotypes" that aggregate multiple context providers with predefined configuration.
Problem: Repetition across similar AI services:
Solution: Stereotypes - Named configurations that can be referenced:
Define a stereotype:
Use the stereotype:
Expansion: At build time,
@CustomerSupportBaseexpands to the full message sequence.Beta Was this translation helpful? Give feedback.
All reactions