Staging to Dev (#1291)

prakriti-solankey · kartikpersistent · kaustubh-darekar · web-flow · commit 51fa6c1fcc84 · 2025-05-19T18:23:31.000+05:30
* Read only mode for unauthenticated users (#1046)

* llm name changes

* build fix

* default mode fix

* ragas model names update

* lint fixes

* Chunk Entities API condition

* added the tooltip for unsupported lllms for ragas metric loading

* removed unused imports

* multimode fix when we get error response

* mode changes for score display

* fix: Fixed the details state handling between multiple chats
feature: Added the warning banner If selected llm model is not supported for raga's evaluation

* Fix: Entity Mode Width Fix

* diffbot fix for async (#797)

* Minor changes (#798)

* added congig variable for default diffbot chat model

* fulltext index creation is skipped when the labels are empty

* entity vector change

* added optinal to communities for entity mode

* updated the entity query

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* New: Added the supported llm models for ragas evaluation

* Fix: Communitites Tab is displayed based communitites length

* added the conversation download button (#800)

* model name correction

* chatmode switch mode fix

* Add API payload GCP logging (#805)

* Adding Links to get neighboring nodes (#796)

* addition of link

* added neighbours query

* implemented with driver

* updated the query

* communitiesInfo name change

* communities.tsx removed

* api integration

* modified response

* entities change

* chunk and communities

* chunk space removal

* added element id to chunks

* loading on click

* format changes

* added file name for Dcoumrnt node

* chat token cut off model name update

* icon change

* duplicate sources removal

* Entity change

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* added error message for doc retriver (#807)

* copy row (#803)

* copy row

* column for copy

* column copy

* Raga's Evaluation For Multi Modes (#806)

* Updatedmodels for ragas eval

* context utilization metrics removed

* updated supported llms for ragas

* removed context utilization

* Implemented Parallel API

* multi api calls error resolved

* MultiMode Metrics

* Fix: Metric Evalution For Single Mode

* multi modes ragas evaluation

* api payload changes

* metric api output format changed

* multi mode ragas changes

* removed pre process dataset

* api response changes

* Multimode metrics api integration

* nan error for no answer resolved

* QA integration changes

---------

Co-authored-by: kaustubh-darekar &lt;kaustubh_darekar@persistent.com&gt;

* lint fixes

* fix: multimode metrics state handling
fix: lint fixes

* fix: Multimode metrics mode change state issue
fix: chunk list style issue

* fix: list style fix

* Correct TYPO mistake

* added new env for ragas embedding model

* Props name changes (#811)

* Props name changes

* removed the accesstoken from row on copy action

* props changes for dropzone component

* graph view changes

---------

Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;

* test

* view graph

* nodes count and relationshipcount updation fix

* sourceUrl Fix

* empty string "" fix to keep the default values we should keep the value blank instead ""

* prop changes

* props changes

* retry condition update for failed files (#820)

* Chat modes name changes (#815)

* Props name changes

* removed the accesstoken from row on copy action

* updated chat mode names

* Chat Modes Name Changes

* lint fixes

* using readble format In UI

* removal of size to avoid console warning

* key add

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;

* Youtube transcript fix with proxy (#822)

* update script for async func

* ragas changes for graph retrieval mode. context added in api output (#825)

* Remove extract latency from logging and add LIMIT in duplicate nodes

* Document updates (#828)

* document updated with ragas evaluation information

* formatting changes

* chatbot api documentation updated

* api details added in document

* function name changed for drop create vector index api

* Update README.md

* updated api structire in docs (#827)

* Update backend_docs.adoc

* 821 llm model listing (#823)

* added logic for document filters

* LLM models

* message change

* link added

* removed the text

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* Exclude session lable node from duplicate nodes list

* Added the tooltip for disabled llm option (#835)

* node size changes

* mode removal of rows check

* formatting

* Exclude __Entity__ node label from duplicate node list

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* fixed the youtube link

* Security header and GZIPMiddleware (#847)

* Added security header all API

* Add GZipMiddleware

* Chunk Text Details (#850)

* Community title added

* Added api for fetching chunk text details

* output format changed for chunk text

* integrated the service layer for chunkdata

* added the chunks

* formatting output of llm call for title generation

* formatting llm output for title generation

* added flex row

* Changes related to pagination of fetch chunk api

* Integrated the pagination

* page changes error resolved for fetch chunk api

* for get neighbours api , community title added in properties

* moving community title related changes to separate branch

* Removed Query module from fastapi import statement

* icon changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Communities Id to Title (#851)

* Staging to main (#735)

* Dev (#537)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* disabled the sumbit buttom on loading

* Deduplication tab (#566)

* de-duplication API

* Update De-Duplicate query

* created the Deduplication tab

* added the API service

* added the removeable tags for similar nodes in deduplication tab

* Integrate Tag

* added GraphLabel

* added loader state

* added the merge service

* integrated the merge API

* Merge Query issue fixed

* Auto refresh the duplicate nodes after merging operation

* added the description for de duplication

* reset on merging

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Update frontend_docs.adoc (#538)

* Update frontend_docs.adoc

* doc update

* Images

* Images folder change

* Images folder change

* test image

* Update frontend_docs.adoc

* image change

* Update frontend_docs.adoc

* Update frontend_docs.adoc

* added the Graph Mode SS

* added the Query SS

* Update frontend_docs.adoc

* conflics fix

* conflict fix

* Update frontend_docs.adoc

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* updated langchain versions (#565)

* Update the De-Duplication query

* Node relationship id type none issue (#547)

* de-duplication API

* Update De-Duplicate query

* Issue fixed Nodes,Relationship Id and Type None or Blank

* added the tooltips

* type fix

* Unneccory import

* added score threshold and added some error handling (#571)

* Update requirements.txt

* Tooltip and other UI fixes (#572)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with …
diff --git a/backend/score.py b/backend/score.py
@@ -4,6 +4,7 @@
 from src.main import *
 from src.QA_integration import *
 from src.shared.common_fn import *
+from src.shared.llm_graph_builder_exception import LLMGraphBuilderException
 import uvicorn
 import asyncio
 import base64
diff --git a/backend/src/llm.py b/backend/src/llm.py
@@ -14,8 +14,9 @@
 import boto3
 import google.auth
 from src.shared.constants import ADDITIONAL_INSTRUCTIONS
+from src.shared.llm_graph_builder_exception import LLMGraphBuilderException
 import re
-import json
+from typing import List
 
 def get_llm(model: str):
     """Retrieve the specified language model based on the model name."""
@@ -209,21 +210,45 @@ async def get_graph_document_list(
     return graph_document_list
 
 async def get_graph_from_llm(model, chunkId_chunkDoc_list, allowedNodes, allowedRelationship, chunks_to_combine, additional_instructions=None):
+   try:
+       llm, model_name = get_llm(model)
+       logging.info(f"Using model: {model_name}")
     
-    llm, model_name = get_llm(model)
-    combined_chunk_document_list = get_combined_chunks(chunkId_chunkDoc_list, chunks_to_combine)
+       combined_chunk_document_list = get_combined_chunks(chunkId_chunkDoc_list, chunks_to_combine)
+       logging.info(f"Combined {len(combined_chunk_document_list)} chunks")
     
-    allowedNodes = allowedNodes.split(',') if allowedNodes else []
+       allowed_nodes = [node.strip() for node in allowedNodes.split(',') if node.strip()]
+       logging.info(f"Allowed nodes: {allowed_nodes}")
+    
+       allowed_relationships = []
+       if allowedRelationship:
+           items = [item.strip() for item in allowedRelationship.split(',') if item.strip()]
+           if len(items) % 3 != 0:
+               raise LLMGraphBuilderException("allowedRelationship must be a multiple of 3 (source, relationship, target)")
+           for i in range(0, len(items), 3):
+               source, relation, target = items[i:i + 3]
+               if source not in allowed_nodes or target not in allowed_nodes:
+                   raise LLMGraphBuilderException(
+                       f"Invalid relationship ({source}, {relation}, {target}): "
+                       f"source or target not in allowedNodes"
+                   )
+               allowed_relationships.append((source, relation, target))
+           logging.info(f"Allowed relationships: {allowed_relationships}")
+       else:
+           logging.info("No allowed relationships provided")
 
-    if not allowedRelationship:
-        allowedRelationship = []
-    else:
-        items = allowedRelationship.split(',')
-        allowedRelationship = [tuple(items[i:i+3]) for i in range(0, len(items), 3)]
-    graph_document_list = await get_graph_document_list(
-        llm, combined_chunk_document_list, allowedNodes, allowedRelationship, additional_instructions
-    )
-    return graph_document_list
+       graph_document_list = await get_graph_document_list(
+           llm,
+           combined_chunk_document_list,
+           allowed_nodes,
+           allowed_relationships,
+           additional_instructions
+       )
+       logging.info(f"Generated {len(graph_document_list)} graph documents")
+       return graph_document_list
+   except Exception as e:
+       logging.error(f"Error in get_graph_from_llm: {e}", exc_info=True)
+       raise LLMGraphBuilderException(f"Error in getting graph from llm: {e}")
 
 def sanitize_additional_instruction(instruction: str) -> str:
    """
diff --git a/frontend/src/components/Layout/PageLayout.tsx b/frontend/src/components/Layout/PageLayout.tsx
@@ -22,7 +22,7 @@ import LoadDBSchemaDialog from '../Popups/GraphEnhancementDialog/EnitityExtracti
 import PredefinedSchemaDialog from '../Popups/GraphEnhancementDialog/EnitityExtraction/PredefinedSchemaDialog';
 import { SKIP_AUTH } from '../../utils/Constants';
 import { useNavigate } from 'react-router';
-import { deduplicateByRelationshipTypeOnly, deduplicateNodeByValue } from '../../utils/Utils';
+import { deduplicateByFullPattern, deduplicateNodeByValue } from '../../utils/Utils';
 
 const GCSModal = lazy(() => import('../DataSources/GCS/GCSModal'));
 const S3Modal = lazy(() => import('../DataSources/AWS/S3Modal'));
@@ -378,7 +378,7 @@ const PageLayout: React.FC = () => {
       setSchemaValRels(rels);
       setCombinedRelsVal((prevRels: OptionType[]) => {
         const combined = [...rels, ...prevRels];
-        return deduplicateByRelationshipTypeOnly(combined);
+        return deduplicateByFullPattern(combined);
       });
       setSchemaView('text');
       localStorage.setItem(LOCAL_KEYS.source, JSON.stringify(updatedSource));
@@ -418,7 +418,7 @@ const PageLayout: React.FC = () => {
       setDbRels(rels);
       setCombinedRelsVal((prevRels: OptionType[]) => {
         const combined = [...rels, ...prevRels];
-        return deduplicateByRelationshipTypeOnly(combined);
+        return deduplicateByFullPattern(combined);
       });
       localStorage.setItem(LOCAL_KEYS.source, JSON.stringify(updatedSource));
       localStorage.setItem(LOCAL_KEYS.type, JSON.stringify(updatedType));
@@ -456,7 +456,7 @@ const PageLayout: React.FC = () => {
       setPreDefinedRels(rels);
       setCombinedRelsVal((prevRels: OptionType[]) => {
         const combined = [...rels, ...prevRels];
-        return deduplicateByRelationshipTypeOnly(combined);
+        return deduplicateByFullPattern(combined);
       });
       localStorage.setItem(LOCAL_KEYS.source, JSON.stringify(updatedSource));
       localStorage.setItem(LOCAL_KEYS.type, JSON.stringify(updatedType));
diff --git a/frontend/src/components/Popups/GraphEnhancementDialog/EnitityExtraction/NewEntityExtractionSetting.tsx b/frontend/src/components/Popups/GraphEnhancementDialog/EnitityExtraction/NewEntityExtractionSetting.tsx
@@ -13,7 +13,7 @@ import {
   updateLocalStorage,
   extractOptions,
   parseRelationshipString,
-  deduplicateByRelationshipTypeOnly,
+  deduplicateByFullPattern,
   deduplicateNodeByValue,
 } from '../../../../utils/Utils';
 import TooltipWrapper from '../../../UI/TipWrapper';
@@ -175,15 +175,15 @@ export default function NewEntityExtractionSetting({
         });
         setUserDefinedRels((prev: OptionType[]) => {
           const combined = [...prev, ...relationshipTypeOptions];
-          return deduplicateByRelationshipTypeOnly(combined);
+          return deduplicateByFullPattern(combined);
         });
         setCombinedNodes((prev: OptionType[]) => {
           const combined = [...prev, ...nodeLabelOptions];
           return deduplicateNodeByValue(combined);
         });
         setCombinedRels((prev: OptionType[]) => {
           const combined = [...prev, ...relationshipTypeOptions];
-          return deduplicateByRelationshipTypeOnly(combined);
+          return deduplicateByFullPattern(combined);
         });
         setTupleOptions((prev) => [...updatedTuples, ...prev]);
       } else {
diff --git a/frontend/src/utils/Utils.ts b/frontend/src/utils/Utils.ts
@@ -881,14 +881,12 @@ export const deduplicateNodeByValue = (arrays: { value: any }[]) => {
   });
   return Array.from(map.values());
 };
-
-export const deduplicateByRelationshipTypeOnly = (arrays: { value: string; label: string }[]) => {
+export const deduplicateByFullPattern = (arrays: { value: string; label: string }[]) => {
   const seen = new Set<string>();
   const result: { value: string; label: string }[] = [];
   arrays.forEach((item) => {
-    const [, type] = item.value.split(',');
-    if (!seen.has(type)) {
-      seen.add(type);
+    if (!seen.has(item.value)) {
+      seen.add(item.value);
       result.push(item);
     }
   });