-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add colpali retriever #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
paknikolai
wants to merge
110
commits into
epam:development
Choose a base branch
from
paknikolai:f/colpali_retriever
base: development
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 75 commits
Commits
Show all changes
110 commits
Select commit
Hold shift + click to select a range
a520953
reapplied commits
9f28efd
fixed path
9b3cd34
upgraded dependencies
0d9d598
fixed pathes
563fd45
added dependencies
4cfd258
added test for colpali retriever
e4dd82c
added code for caching results
85506b4
fixed the test cahce and added cache itsef
a0064a4
added e2e test code
a0a8f55
updated code of test to match the query
8e4c225
updated test and cache
628c6a3
updated test files
5d92ca6
updated caches for the tests
cbbc41d
updated cache
accf675
added config yaml for colpali and updated tests
ed060e1
added image size to colpali config
07c4886
added creation of colpaliresource on creation,
2162e91
moved colpali resource to colpali folder
bc4881b
partlt changed the calculation of scores
592ea07
updated retriever code to use page embeddings and score them and then…
a688b84
changed enum+str to strenum, added consistency check for the model na…
66fcdeb
added lock for gpu related operations
d6bc224
added cpu pools usage
3039f13
changed bloat16 to float16
dbac766
import fixes
9201739
fixed lint issues
b15ca56
fixed cache and refactored cache code
3acd205
removed ignores
bb52615
removed unnecessary processor caching
474eda4
switched device choice to existing function
07fab25
Merge branch 'development' into f/colpali_retriever
8e6ce6d
removed unnecessary setting of lib version
9ac5474
added comment for the colpali-engine dependency
e1f3002
put colpali config outside of the request config
5d93a2c
added batch that is being collected from different tasks and then bei…
1241dd7
added hasg based cache
9302672
added test for parallel queries and images
d3896d8
fixed moving to cpu embeddings
297166a
added additional pools not to block other pools
d5d1202
replaced progress bar
05a656c
caching models in docker on creating image
6b96ee4
fixed donwload model script
2728522
fixed docker file
f2bd680
added file with common models info and pathes, changed docker file to…
5c4a8a3
changed config according to fields in app config
137d828
added cache in model while saving it
f915106
fixed device
cea1cc1
fixed model cache save
e3aa186
added copying additional files for docker to download models
377ed97
fixed format
3dd64a9
removed from resource config model type, now it is being calculated b…
5394186
fixed tests
1453542
fixed format
6628b2c
Merge branch 'development' into f/colpali_retriever
6386c8a
changed doc_id to original index that was in from_doc_records
22e3544
poetry lock
0af4073
updated poetry lock with minimal changes comapring to original one
1b12f88
fixed test
6475b59
renamed file to match cache for test
c65faff
updated cahce method, and replaced cahce files
e4063e4
changed yaml name
ccfe1e3
added fair queue to process tasks
040cbd6
renamed function
e787ade
changed variables order
5d29e5f
Merge branch 'development' into f/colpali_retriever
2b367dc
changed colpali version to realeased version
6116d29
made separate docker file+changed script to download model
60b5b3c
added arg for base image
d0b45c3
added stage to donwload model and some logs to verify env variables
719f8f8
fixed model env variable and added perint to log when loading the model
d937b86
fixed env
e714b53
removed unseccessary variable from docker
e11eec9
Merge branch 'development' into f/colpali_retriever
paknikolai 5ccabd3
updated readme
becad27
Merge branch 'development' into f/colpali_retriever
f478ffc
removed image size field because it was unecessary
9864a8a
removed unused import
8139132
changed paramerers order to be consistent with embeddings
bb6db25
removed unnecessary print
5cac3be
removed unused dependency
8c682c5
moved load model to different function and removed set config from re…
ed020fa
replaced with map with progress query and image processing
f10a631
made a str enum and renamed map fro known models
ef9c231
removed batch prcessor
f1000da
fixed queries
90cd989
fixed format
111dc8e
fixed lint
1c73c81
fixed queries
a39bee5
added batch size to parameters
6c106a8
moved model call inside resource
fe92a1e
removed unusd config
90fda28
fixed docker file since folder with dial rag is already copied
f9b22b7
fixed downloading script
e381ae5
merged develop branch
af19431
fixed format
c796ffd
added some comments
06def61
Merge branch 'development' into f/colpali_retriever
paknikolai e2717ac
fixed type for images in calculate_images_embeddings
dc8d368
simplified replacing cached resource in tests
dc2beb5
created constabt for the test folder
b3ef125
removed colpali index from colpali resource
056722f
moved common embeddings to separate folder
ead9b7e
Merge branch 'development' into f/colpali_retriever
paknikolai 0532587
separated download scripts
95bf413
corrected argument line
5d3c6cd
replaced env variable for model path with config
b829ff7
removed padding of embeddings
73a05ed
changed hf_home to cache models there
53e09a8
updated donwload
73e51bb
Merge branch 'development' into f/colpali_retriever
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Set base image with default value | ||
| ARG BASE_IMAGE_NAME=epam/ai-dial-rag:latest | ||
|
|
||
| # Stage 1: Download ColPali model | ||
| FROM ${BASE_IMAGE_NAME} AS colpali_downloader | ||
|
|
||
| # Set environment variables for ColPali models | ||
| ENV COLPALI_MODELS_BASE_PATH=/colpali_models | ||
|
|
||
| # Set specific model to download with default value | ||
| ARG COLPALI_MODEL_NAME=vidore/colSmol-256M | ||
| ENV COLPALI_MODEL_NAME=${COLPALI_MODEL_NAME} | ||
|
|
||
| # Switch to root user for model downloads | ||
| USER root | ||
|
|
||
| # Copy necessary files for ColPali model download | ||
| COPY aidial_rag/__init__.py aidial_rag/ | ||
| COPY aidial_rag/retrievers/__init__.py aidial_rag/retrievers/ | ||
| COPY aidial_rag/retrievers/colpali_retriever/__init__.py aidial_rag/retrievers/colpali_retriever/ | ||
| COPY aidial_rag/retrievers/colpali_retriever/colpali_models.py aidial_rag/retrievers/colpali_retriever/ | ||
| COPY download_model.py ./ | ||
|
|
||
| # Download the specified ColPali model | ||
| RUN python download_model.py colpali "$COLPALI_MODELS_BASE_PATH" "$COLPALI_MODEL_NAME" | ||
|
|
||
| # Stage 2: Final image with downloaded model | ||
| FROM ${BASE_IMAGE_NAME} | ||
|
|
||
| # Set environment variables for ColPali models | ||
| ENV COLPALI_MODELS_BASE_PATH=/colpali_models | ||
|
|
||
| # Copy the downloaded ColPali model from the downloader stage | ||
| COPY --from=colpali_downloader --chown=appuser "$COLPALI_MODELS_BASE_PATH" "$COLPALI_MODELS_BASE_PATH" | ||
|
|
||
| # Switch back to appuser | ||
| USER appuser | ||
|
|
||
| # The base image already has EXPOSE 5000 and CMD, so we inherit those | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to copy aidial_rag files here?
epam/ai-dial-ragbase image should already have these files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed here it has already been copied in base image, left only copying downloading script