Skip to content
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
ff3fe68
feat: cache huggingface models
rti Feb 1, 2024
38a3bf9
fix: sentence_transformers version
rti Feb 1, 2024
3fb6fd0
chore: remove custom model based on modelfile
rti Feb 1, 2024
a4c7294
fix(frontend): do not filter by score for now TBD
rti Feb 1, 2024
d38c5f0
chore: remove debug/test code
rti Feb 1, 2024
dc4501a
fix: required sentence_transformers version was actually > 2.2.0
rti Feb 1, 2024
42cdcc5
docs: add notes about embedding models to readme
rti Feb 1, 2024
13bc12e
chore: add debug output to api.py
rti Feb 1, 2024
4933a9a
fix: question in prompt
rti Feb 1, 2024
b23833b
chore: top_k 3 results for now
rti Feb 1, 2024
da1017b
wip: embeddings cache
rti Feb 1, 2024
41ff046
feat: document splitter
rti Feb 1, 2024
4e69697
Update .dockerignore
exowanderer Feb 2, 2024
10103c6
Merge branch 'main' into integration
rti Feb 4, 2024
0ee6ed5
docs: note on how to dev locally
rti Feb 4, 2024
7a2c955
docs: add research_log.md
rti Feb 4, 2024
0a5e2be
feat: set top_k via api
rti Feb 5, 2024
332e3dc
feat: support en and de on the api to switch prompts
rti Feb 5, 2024
6225fcc
feat: cache embedding model during docker build
rti Feb 5, 2024
4877807
wip: smaller chunk size, 5 sentences for now
rti Feb 5, 2024
da9859d
chore: remove comment
rti Feb 5, 2024
291aaaf
feat: enable embeddings cache (for developmnet)
rti Feb 9, 2024
936d83e
feat: add document cleaner
rti Feb 9, 2024
1b88437
Merge branch 'main' into integration
rti Feb 9, 2024
3e0b8f4
docs: long docker run options
rti Feb 9, 2024
edf5eb2
fix: access mode
rti Feb 9, 2024
63baf2b
fix: redraw loading animation on subsequent searches
rti Feb 9, 2024
56a7b8c
wip: workaround for runpod.io http port forwarding
rti Feb 9, 2024
8e05473
feat: switch to openchat 7b model
rti Feb 9, 2024
8276e35
Merge branch 'openchat' into integration
rti Feb 9, 2024
22b04d0
added logging via logger with Handler to api.py; PEP8 formatted api.py
exowanderer Feb 9, 2024
10f6b21
debugging use of homepage instead of hard coded endpoint values
exowanderer Feb 9, 2024
bfbd245
returning to previous to restart without errors
exowanderer Feb 9, 2024
7b6ba0a
renewed app.mount; bug fixed PEP8 changes in api.py; reformatted rag.…
exowanderer Feb 9, 2024
0428f87
returned to stablelm2 model for testing purposes. PEP8 upgrades in ap…
exowanderer Feb 9, 2024
8104dde
added OLLAMA_MODEL_NAME and OLLAMA_URL as environment variables; call…
exowanderer Feb 9, 2024
fbc4591
created logger.py to serve get_logger to all modules
exowanderer Feb 9, 2024
caecfd1
created a rag_pipeline in the rag.py based on the usage in api.py; re…
exowanderer Feb 9, 2024
5c0b4d0
UPdated with PEP8 formatting in vector_store_interface.py
exowanderer Feb 9, 2024
8833af7
chore(Dockerfile): install python deps early
rti Feb 12, 2024
9ee8a32
fix(sentence-transformers): use cuda if available
rti Feb 12, 2024
b2357e3
fix(frontend): run from webserver root
rti Feb 12, 2024
b518abf
feat: store embedding cache in volume
rti Feb 12, 2024
69800b0
feat(start.sh): pull llm using ollama (if not built into container)
rti Feb 12, 2024
7803649
feat(ollama): use chat api to leverage prompt templates
rti Feb 12, 2024
ff1fcab
docs: fix run cmd
rti Feb 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,10 @@ ENV PATH="/usr/local/ollama/bin:${PATH}"


# Pull a language model (see LICENSE_STABLELM2.txt)
ARG MODEL=stablelm2:1.6b-zephyr
ARG MODEL=openchat
ENV MODEL=${MODEL}
RUN ollama serve & while ! curl http://localhost:11434; do sleep 1; done; ollama pull $MODEL

# Build a language model
# ARG MODEL=discolm
# ENV MODEL=${MODEL}
# WORKDIR /tmp/model
# COPY --chmod=644 Modelfile Modelfile
# RUN curl --location https://huggingface.co/TheBloke/DiscoLM_German_7b_v1-GGUF/resolve/main/discolm_german_7b_v1.Q5_K_S.gguf?download=true --output discolm_german_7b_v1.Q5_K_S.gguf; ollama serve & while ! curl http://localhost:11434; do sleep 1; done; ollama create ${MODEL} -f Modelfile && rm -rf /tmp/model


# Setup the custom API and frontend
WORKDIR /workspace
Expand All @@ -58,6 +51,11 @@ COPY --chmod=755 requirements.txt requirements.txt
RUN pip install -r requirements.txt


# Load sentence-transformers model once in order to cache it in the image
# TODO: ARG / ENV for embedder model
RUN echo "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\nSentenceTransformersDocumentEmbedder(model='svalabs/german-gpl-adapted-covid').warm_up()" | python3


# Install frontend dependencies
COPY --chmod=755 frontend/package.json frontend/package.json
COPY --chmod=755 frontend/yarn.lock frontend/yarn.lock
Expand All @@ -69,7 +67,7 @@ COPY --chmod=755 json_input json_input


# Copy backend for production
COPY --chmod=644 gswikichat gswikichat
COPY --chmod=755 gswikichat gswikichat


# Copy and build frontend for production (into the frontend/dist folder)
Expand Down
2 changes: 0 additions & 2 deletions Modelfile

This file was deleted.

27 changes: 24 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,36 @@
To build and run the container locally with hot reload on python files do:
```
DOCKER_BUILDKIT=1 docker build . -t gbnc
docker run -v "$(pwd)/gswikichat":/workspace/gswikichat \
-p 8000:8000 --rm --name gbnc -it gbnc \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
docker run \
--env HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
--volume "$(pwd)/gswikichat":/workspace/gswikichat \
--volume "$(pwd)/cache":/root/.cache \
--publish 8000:8000 \
--rm \
--interactive \
--tty \
--name gbnc \
gbnc
```
Point your browser to http://localhost:8000/ and use the frontend.

### Runpod.io

The container works on [runpod.io](https://www.runpod.io/) GPU instances. A [template is available here](https://runpod.io/gsc?template=0w8z55rf19&ref=yfvyfa0s).

### Local development
#### Backend
```
python -m venv .venv
. ./.venv/bin/activate
pip install -r requirements.txt
```
#### Frontend
```
cd frontend
yarn dev
```

## What's in the box

### Docker container
Expand All @@ -44,3 +64,4 @@ A [FastAPI](https://fastapi.tiangolo.com/) server is running in the container. I
### Frontend

A minimal frontend lets the user input a question and renders the response from the system.

Empty file added cache/.keep
Empty file.
2 changes: 1 addition & 1 deletion frontend/src/components/field/FieldAnswer.vue
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
<div v-else>
<div v-if="response && response.sources">
<div v-for="s in response.sources" :key="s.id">
<div v-if="s.score > 2" class="mb-2">
<div v-if="s.score > 0" class="mb-2">
<details
class="text-sm cursor-pointer text-light-distinct-text dark:text-dark-distinct-text"
>
Expand Down
1 change: 1 addition & 0 deletions frontend/src/views/ChatView.vue
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ const inputFocused = ref(false)
// }

function search() {
response.value = undefined;
displayResponse.value = true
fetch(`/api?q=${inputText.value}`)
.then((response) => response.json())
Expand Down
1 change: 0 additions & 1 deletion gswikichat/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
from .api import *
# from .haystack2beta_tutorial_InMemoryEmbeddingRetriever import *
57 changes: 31 additions & 26 deletions gswikichat/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from fastapi.staticfiles import StaticFiles
from fastapi import FastAPI

# from .rag import rag_pipeline
from .rag import embedder, retriever, prompt_builder, llm, answer_builder
from haystack import Document

Expand All @@ -17,55 +16,61 @@

@app.get("/")
async def root():
return RedirectResponse(url="/frontend/dist", status_code=302)
# return RedirectResponse(url="/frontend/dist", status_code=308)
return {}


@app.get("/api")
async def api(q):
async def api(q, top_k = 3, lang = 'en'):
if not lang in ['en', 'de']:
raise Exception("language must be 'en' or 'de'")

embedder, retriever, prompt_builder, llm, answer_builder
print(f"{q=}")
print(f"{top_k=}")
print(f"{lang=}")

# query = "How many languages are there?"
query = Document(content=q)

result = embedder.run([query])
queryEmbedded = embedder.run([query])
queryEmbedding = queryEmbedded['documents'][0].embedding

results = retriever.run(
query_embedding=list(result['documents'][0].embedding),
retrieverResults = retriever.run(
query_embedding=list(queryEmbedding),
filters=None,
top_k=None,
top_k=top_k,
scale_score=None,
return_embedding=None
)
# .run(
# result['documents'][0].embedding
# )

prompt = prompt_builder.run(documents=results['documents'])['prompt']
print("retriever results:")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement the logging as suggested above, we should include this as a debug statement

logging.debug('retriever results:')

Copy link
Collaborator

@exowanderer exowanderer Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by adding the get_logger function in the logger.py file.
If you confirm, then we can close this review comment.

for retrieverResult in retrieverResults:
print(retrieverResult)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the suggestion above to include a logger, then we should replace this print statement as a debug

logging.debug(retriever_result_)

Note that this suggestion includes the trailing underscore that I prefer, but is non-standard

Copy link
Collaborator

@exowanderer exowanderer Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by adding the get_logger function in the logger.py file.
If you confirm, then we can close this review comment.


response = llm.run(prompt=prompt, generation_kwargs=None)
# reply = response['replies'][0]
promptBuilder = prompt_builder[lang]
promptBuild = promptBuilder.run(question=q, documents=retrieverResults['documents'])
prompt = promptBuild['prompt']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the PEP8 standards, it is highly recommended to use snake_case here:

prompt = prompt_build['prompt']

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by renaming promptBuild to prompt_build.
If you confirm, then we can close this review comment.


print(f"{prompt=}")

# rag_pipeline.connect("llm.replies", "answer_builder.replies")
# rag_pipeline.connect("llm.metadata", "answer_builder.meta")
# rag_pipeline.connect("retriever", "answer_builder.documents")
response = llm.run(prompt=prompt, generation_kwargs=None)

results = answer_builder.run(
answerBuild = answer_builder.run(
query=q,
replies=response['replies'],
meta=response['meta'],
documents=results['documents'],
documents=retrieverResults['documents'],
pattern=None,
reference_pattern=None
)
print("answerBuild", answerBuild)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the above suggestions of adding logging and using snake_case, we should change this line to

logging.debug(f'{answer_build=}')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by renaming answer_Build as answer_build.
If you confirm, then we can close this review comment.


answer = answerBuild['answers'][0]

sources= [{ "src": d.meta['src'], "content": d.content, "score": d.score } for d in answer.documents]

answer = results['answers'][0]
print("answer", answer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement the logging suggestion above, we should change this line to

logging.debug(f'{answer=}')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by adding the get_logger function in the logger.py file.
If you confirm, then we can close this review comment.


return {
"answer": answer.data,
"sources": [{
"src": d.meta['src'],
"content": d.content,
"score": d.score
} for d in answer.documents]
"sources": sources
}
1 change: 0 additions & 1 deletion gswikichat/llm_config.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import os
from haystack_integrations.components.generators.ollama import OllamaGenerator

# TODO: discolm prompt https://huggingface.co/DiscoResearch/DiscoLM_German_7b_v1
print(f"Setting up ollama with {os.getenv('MODEL')}")
llm = OllamaGenerator(
model=os.getenv("MODEL"),
Expand Down
23 changes: 21 additions & 2 deletions gswikichat/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# {% endfor %}
# """

prompt_template = """
prompt_template_en = """
<|system|>
You are a helpful assistant. You answer questions based on the given documents.
Answer based on the documents only. If the information is not in the documents,
Expand All @@ -25,6 +25,22 @@
<|assistant|>
"""

prompt_template_de = """
<|system|>
Du bist ein hilfreicher Assistent. Du beantwortest Fragen basierend auf den vorliegenden Dokumenten.
Beantworte basierend auf den Dokumenten nur. Wenn die Information nicht in den Dokumenten ist,
sage, dass du sie nicht finden kannst.
<|endoftext|>
<|user|>
Dokumente:
{% for doc in documents %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As defined elsewhere, I am suggesting that we add a trailing underscore for temporary variabels:

For the jinja used here, this suggestion would result in the follow update:

{% for doc_ in documents %}
     {{ doc_.content }}

If agreed, the same should be implemented in the prompt_template_en above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #24 by renaming doc to doc_ in the jinja text.
If you confirm, then we can close this review comment.

{{ doc.content }}
{% endfor %}
Mit diesen Dokumenten, beantworte die folgende Frage: {{question}}
<|endoftext|>
<|assistant|>
"""

# prompt_template = """
# Given these documents, answer the question. Answer in a full sentence. Give the response only, no explanation. Don't mention the documents.
# Documents:
Expand All @@ -33,4 +49,7 @@
# {% endfor %}
# """

prompt_builder = PromptBuilder(template=prompt_template)
prompt_builder = {
'en': PromptBuilder(template=prompt_template_en),
'de': PromptBuilder(template=prompt_template_de),
}
Loading