Skip to content

Add LLM inference for response generation#43

Merged
edkaya merged 9 commits intomainfrom
feature/setup-llm-inference
Jun 10, 2025
Merged

Add LLM inference for response generation#43
edkaya merged 9 commits intomainfrom
feature/setup-llm-inference

Conversation

@edkaya
Copy link
Collaborator

@edkaya edkaya commented Jun 10, 2025

  • Added BaseChat model
  • Added webui service to make calls to llm
  • Added rag service to implement retrieval logic + prompt generation
  • Introduced the api endpoint in genai -> /genai/generate
    -- Body must contain the query and conversation_id (or another kind of id, which is necessary to make a db call to fetch user chat history, will be further discussed in the week).

@edkaya edkaya requested a review from esadakcam June 10, 2025 00:12
collection_name
)
# todo: retrieve messages from chat history as BaseMessage
messages = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have tree options for fetching the history of user:

  • The request body contains the history. On server side before sending the request to the Genai service, we can attach the history to the request. Also, chat flow is managed by server. Moreover, we can attach user's food preferences.
  • Genai service can query the mongodb.
  • Genai service can request history from server.

Personally, I prefer the first one. The flow is managed easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about the first two options, but the first one makes also better sense to me. Therefore, we would not handle conversationId logic in the genai service either. Lets use the first one


from genai.config import Config

BASE_URL = "https://gpu.aet.cit.tum.de/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make it a config so that we can easily change the LLM provider.

stop=None,
**kwargs) -> ChatResult:
prompt = "\n".join([
msg.content for msg in messages if isinstance(msg, HumanMessage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that we add only user's messages as context? If so, shouldn't we include all chat?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, inlcuding llm's messages has sometimes a downside, where the llm can hallucinate from its previous responses in the chat history and could not fully generate a correct answer for an already given query. (If the given response was not correct in the chat history)

But if we think about our application, i think it wont be a huge issue, since we prioritize the context first and utilize a full RAG functionality. So I will add ai messages in response generation 👍

@edkaya edkaya requested a review from esadakcam June 10, 2025 11:13
messages (List[Dict]): Full conversation history, each with 'role' and 'content'
Example:
[
{"role": "user", "content": "I have eggs and tomatoes."},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On server role is an enum and stored all capital letters. It is better to check the role case insensitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esadakcam I changed it, thanks! I just checked server code, and I think there is typo in server enums, it should be ASSISTANT instead of ASISTANT. Since I dont want to break the structure in server code, could you change it in a follow up?

@edkaya edkaya merged commit 8ce1842 into main Jun 10, 2025
8 checks passed
@edkaya edkaya deleted the feature/setup-llm-inference branch June 21, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants