Skip to content

Commit 72253ab

Browse files
aashipandyakartikpersistentedenbuaaabhishekkumar-27vasanthasaikalluri
committed
Graph visualization improvement and chunk text details (#860)
* Update README.md * Dev to staging (#859) * processing count updated on cancel * format fixes * remove whitespace for enviroment variable which due to an error "xxx may not contain whitespace" (#707) * updated disconnected nodes * updated disconnected nodes * fix: Processed count update on failed condition * added disconnected and up nodes * removed __Entity__ labels * removed graph_object * removed graph object in the function * resetting the alert message on success scenario * Modified queries * populate graph schema * not clearing the password when there is error scenario * fixed the vector index loading issue * fix: empty credentials payload for recreate vector index api * chatbot status (#676) * chatbot status * connection status check for ASK button * refresh disable check * review comment resolved * format fixes * added properties and modified to entity labels * Post processing call after all files completion (#716) * Dev To STAGING (#532) * format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> * Dev (#535) * format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> * Dev (#537) * format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> * Fix typo: correct 'josn_obj' to 'json_obj' (#697) * Staging To Main (#495) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * Dev (#433) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: Pravesh Kumar <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env * DEV to STAGING (#461) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: aashipandya <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * DEV to STAGING (#462) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: aashipandya <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * added upload api * changed the dropzone error message * Dev to staging (#466) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files f… --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: edenbuaa <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: destiny966113 <[email protected]> Co-authored-by: Pravesh Kumar <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> Co-authored-by: Ikko Eltociear Ashimine <[email protected]> Co-authored-by: Pravesh1988 <[email protected]> Co-authored-by: Jayanth T <[email protected]> Co-authored-by: Jerome Choo <[email protected]> Co-authored-by: jeromechoo <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: kaustubh-darekar <[email protected]> Co-authored-by: a-s-poorna <[email protected]> Co-authored-by: Jayanth T <[email protected]> Co-authored-by: Michael Hunger <[email protected]> Co-authored-by: Kain Shu <[email protected]>
1 parent 434df2e commit 72253ab

File tree

18 files changed

+219
-201
lines changed

18 files changed

+219
-201
lines changed

backend/score.py

Lines changed: 57 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from src.communities import create_communities
1919
from src.neighbours import get_neighbour_nodes
2020
import json
21-
from typing import List
21+
from typing import List, Mapping, Union
2222
from starlette.middleware.sessions import SessionMiddleware
2323
from google.oauth2.credentials import Credentials
2424
import os
@@ -30,8 +30,8 @@
3030
from Secweb.XFrameOptions import XFrame
3131
from fastapi.middleware.gzip import GZipMiddleware
3232
from src.ragas_eval import *
33-
from starlette.types import ASGIApp, Receive, Scope, Send
34-
from langchain_neo4j import Neo4jGraph
33+
from starlette.types import ASGIApp, Message, Receive, Scope, Send
34+
import gzip
3535

3636
logger = CustomLogger()
3737
CHUNK_DIR = os.path.join(os.path.dirname(__file__), "chunks")
@@ -77,9 +77,10 @@ async def __call__(self, scope: Scope, receive: Receive, send: Send):
7777
await gzip_middleware(scope, receive, send)
7878
app = FastAPI()
7979
# SecWeb(app=app, Option={'referrer': False, 'xframe': False})
80-
# app.add_middleware(ContentSecurityPolicy, Option={'default-src': ["'self'"], 'base-uri': ["'self'"], 'block-all-mixed-content': []}, script_nonce=False, style_nonce=False, report_only=False)
80+
app.add_middleware(ContentSecurityPolicy, Option={'default-src': ["'self'"], 'base-uri': ["'self'"], 'block-all-mixed-content': []}, script_nonce=False, style_nonce=False, report_only=False)
8181
app.add_middleware(XContentTypeOptions)
8282
app.add_middleware(XFrame, Option={'X-Frame-Options': 'DENY'})
83+
#app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=5)
8384
app.add_middleware(CustomGZipMiddleware, minimum_size=1000, compresslevel=5,paths=["/sources_list","/url/scan","/extract","/chat_bot","/chunk_entities","/get_neighbours","/graph_query","/schema","/populate_graph_schema","/get_unconnected_nodes_list","/get_duplicate_nodes","/fetch_chunktext"])
8485
app.add_middleware(
8586
CORSMiddleware,
@@ -845,5 +846,57 @@ async def retry_processing(uri=Form(), userName=Form(), password=Form(), databas
845846
finally:
846847
gc.collect()
847848

849+
@app.post("/fetch_chunktext")
850+
async def fetch_chunktext(
851+
uri: str = Form(),
852+
database: str = Form(),
853+
userName: str = Form(),
854+
password: str = Form(),
855+
document_name: str = Form(),
856+
page_no: int = Form(1)
857+
):
858+
try:
859+
payload_json_obj = {
860+
'api_name': 'fetch_chunktext',
861+
'db_url': uri,
862+
'userName': userName,
863+
'database': database,
864+
'document_name': document_name,
865+
'page_no': page_no,
866+
'logging_time': formatted_time(datetime.now(timezone.utc))
867+
}
868+
logger.log_struct(payload_json_obj, "INFO")
869+
start = time.time()
870+
result = await asyncio.to_thread(
871+
get_chunktext_results,
872+
uri=uri,
873+
username=userName,
874+
password=password,
875+
database=database,
876+
document_name=document_name,
877+
page_no=page_no
878+
)
879+
end = time.time()
880+
elapsed_time = end - start
881+
json_obj = {
882+
'api_name': 'fetch_chunktext',
883+
'db_url': uri,
884+
'document_name': document_name,
885+
'page_no': page_no,
886+
'logging_time': formatted_time(datetime.now(timezone.utc)),
887+
'elapsed_api_time': f'{elapsed_time:.2f}'
888+
}
889+
logger.log_struct(json_obj, "INFO")
890+
return create_api_response('Success', data=result, message=f"Total elapsed API time {elapsed_time:.2f}")
891+
except Exception as e:
892+
job_status = "Failed"
893+
message = "Unable to get chunk text response"
894+
error_message = str(e)
895+
logging.exception(f'Exception in fetch_chunktext: {error_message}')
896+
return create_api_response(job_status, message=message, error=error_message)
897+
finally:
898+
gc.collect()
899+
900+
848901
if __name__ == "__main__":
849902
uvicorn.run(app)

backend/src/graphDB_dataAccess.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,7 @@ def get_duplicate_nodes_list(self):
356356
score_value = float(os.environ.get('DUPLICATE_SCORE_VALUE'))
357357
text_distance = int(os.environ.get('DUPLICATE_TEXT_DISTANCE'))
358358
query_duplicate_nodes = """
359-
MATCH (n:!Chunk&!Session&!Document&!`__Community__`) with n
359+
MATCH (n:!Chunk&!Session&!Document&!`__Community__`&!`__Entity__`) with n
360360
WHERE n.embedding is not null and n.id is not null // and size(toString(n.id)) > 3
361361
WITH n ORDER BY count {{ (n)--() }} DESC, size(toString(n.id)) DESC // updated
362362
WITH collect(n) as nodes

backend/src/graph_query.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,6 @@ def get_graph_results(uri, username, password,database,document_names):
227227

228228
def get_chunktext_results(uri, username, password, database, document_name, page_no):
229229
"""Retrieves chunk text, position, and page number from graph data with pagination."""
230-
driver = None
231230
try:
232231
logging.info("Starting chunk text query process")
233232
offset = 10
@@ -256,5 +255,4 @@ def get_chunktext_results(uri, username, password, database, document_name, page
256255
logging.error(f"An error occurred in get_chunktext_results. Error: {str(e)}")
257256
raise Exception("An error occurred in get_chunktext_results. Please check the logs for more details.") from e
258257
finally:
259-
if driver:
260-
driver.close()
258+
driver.close()

backend/src/shared/constants.py

Lines changed: 0 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -173,90 +173,6 @@
173173
LIMIT $limit
174174
"""
175175

176-
NODEREL_COUNT_QUERY_WITH_COMMUNITY = """
177-
MATCH (d:Document)
178-
WHERE d.fileName IS NOT NULL
179-
OPTIONAL MATCH (d)<-[po:PART_OF]-(c:Chunk)
180-
OPTIONAL MATCH (c)-[he:HAS_ENTITY]->(e:__Entity__)
181-
OPTIONAL MATCH (c)-[sim:SIMILAR]->(c2:Chunk)
182-
OPTIONAL MATCH (c)-[nc:NEXT_CHUNK]->(c3:Chunk)
183-
OPTIONAL MATCH (e)-[ic:IN_COMMUNITY]->(comm:__Community__)
184-
OPTIONAL MATCH (comm)-[pc1:PARENT_COMMUNITY]->(first_level:__Community__)
185-
OPTIONAL MATCH (first_level)-[pc2:PARENT_COMMUNITY]->(second_level:__Community__)
186-
OPTIONAL MATCH (second_level)-[pc3:PARENT_COMMUNITY]->(third_level:__Community__)
187-
WITH
188-
d.fileName AS filename,
189-
count(DISTINCT c) AS chunkNodeCount,
190-
count(DISTINCT po) AS partOfRelCount,
191-
count(DISTINCT he) AS hasEntityRelCount,
192-
count(DISTINCT sim) AS similarRelCount,
193-
count(DISTINCT nc) AS nextChunkRelCount,
194-
count(DISTINCT e) AS entityNodeCount,
195-
collect(DISTINCT e) AS entities,
196-
count(DISTINCT comm) AS baseCommunityCount,
197-
count(DISTINCT first_level) AS firstlevelcommCount,
198-
count(DISTINCT second_level) AS secondlevelcommCount,
199-
count(DISTINCT third_level) AS thirdlevelcommCount,
200-
count(DISTINCT ic) AS inCommunityCount,
201-
count(DISTINCT pc1) AS parentCommunityRelCount1,
202-
count(DISTINCT pc2) AS parentCommunityRelCount2,
203-
count(DISTINCT pc3) AS parentCommunityRelCount3
204-
WITH
205-
filename,
206-
chunkNodeCount,
207-
partOfRelCount + hasEntityRelCount + similarRelCount + nextChunkRelCount AS chunkRelCount,
208-
entityNodeCount,
209-
entities,
210-
baseCommunityCount + firstlevelcommCount + secondlevelcommCount + thirdlevelcommCount AS commCount,
211-
inCommunityCount + parentCommunityRelCount1 + parentCommunityRelCount2 + parentCommunityRelCount3 AS communityRelCount
212-
CALL (entities) {
213-
UNWIND entities AS e
214-
RETURN sum(COUNT { (e)-->(e2:__Entity__) WHERE e2 in entities }) AS entityEntityRelCount
215-
}
216-
RETURN
217-
filename,
218-
COALESCE(chunkNodeCount, 0) AS chunkNodeCount,
219-
COALESCE(chunkRelCount, 0) AS chunkRelCount,
220-
COALESCE(entityNodeCount, 0) AS entityNodeCount,
221-
COALESCE(entityEntityRelCount, 0) AS entityEntityRelCount,
222-
COALESCE(commCount, 0) AS communityNodeCount,
223-
COALESCE(communityRelCount, 0) AS communityRelCount
224-
"""
225-
NODEREL_COUNT_QUERY_WITHOUT_COMMUNITY = """
226-
MATCH (d:Document)
227-
WHERE d.fileName = $document_name
228-
OPTIONAL MATCH (d)<-[po:PART_OF]-(c:Chunk)
229-
OPTIONAL MATCH (c)-[he:HAS_ENTITY]->(e:__Entity__)
230-
OPTIONAL MATCH (c)-[sim:SIMILAR]->(c2:Chunk)
231-
OPTIONAL MATCH (c)-[nc:NEXT_CHUNK]->(c3:Chunk)
232-
WITH
233-
d.fileName AS filename,
234-
count(DISTINCT c) AS chunkNodeCount,
235-
count(DISTINCT po) AS partOfRelCount,
236-
count(DISTINCT he) AS hasEntityRelCount,
237-
count(DISTINCT sim) AS similarRelCount,
238-
count(DISTINCT nc) AS nextChunkRelCount,
239-
count(DISTINCT e) AS entityNodeCount,
240-
collect(DISTINCT e) AS entities
241-
WITH
242-
filename,
243-
chunkNodeCount,
244-
partOfRelCount + hasEntityRelCount + similarRelCount + nextChunkRelCount AS chunkRelCount,
245-
entityNodeCount,
246-
entities
247-
CALL (entities) {
248-
UNWIND entities AS e
249-
RETURN sum(COUNT { (e)-->(e2:__Entity__) WHERE e2 in entities }) AS entityEntityRelCount
250-
}
251-
RETURN
252-
filename,
253-
COALESCE(chunkNodeCount, 0) AS chunkNodeCount,
254-
COALESCE(chunkRelCount, 0) AS chunkRelCount,
255-
COALESCE(entityNodeCount, 0) AS entityNodeCount,
256-
COALESCE(entityEntityRelCount, 0) AS entityEntityRelCount
257-
"""
258-
259-
260176
## CHAT SETUP
261177
CHAT_MAX_TOKENS = 1000
262178
CHAT_SEARCH_KWARG_SCORE_THRESHOLD = 0.5

frontend/src/components/ChatBot/ChunkInfo.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,4 +268,4 @@ const ChunkInfo: FC<ChunkProps> = ({ loading, chunks, mode }) => {
268268
</>
269269
);
270270
};
271-
export default ChunkInfo;
271+
export default ChunkInfo;

frontend/src/components/ChatBot/CommunitiesInfo.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,4 +79,4 @@ const CommunitiesInfo: FC<CommunitiesProps> = ({ loading, communities, mode }) =
7979
);
8080
};
8181

82-
export default CommunitiesInfo;
82+
export default CommunitiesInfo;

frontend/src/components/ChatBot/EntitiesInfo.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,4 +150,4 @@ const EntitiesInfo: FC<EntitiesProps> = ({ loading, mode, graphonly_entities, in
150150
</>
151151
);
152152
};
153-
export default EntitiesInfo;
153+
export default EntitiesInfo;

frontend/src/components/ChatBot/chatInfo.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,4 @@ export const handleGraphNodeClick = async (
4040
setLoadingGraphView(false);
4141
}
4242
}
43-
};
43+
};

frontend/src/components/Content.tsx

Lines changed: 66 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ import {
1111
CustomFile,
1212
OptionType,
1313
UserCredentials,
14+
chunkdata,
1415
connectionState,
1516
} from '../types';
1617
import deleteAPI from '../services/DeleteFiles';
@@ -36,7 +37,6 @@ import { useMessageContext } from '../context/UserMessages';
3637
import PostProcessingToast from './Popups/GraphEnhancementDialog/PostProcessingCheckList/PostProcessingToast';
3738
import { getChunkText } from '../services/getChunkText';
3839
import ChunkPopUp from './Popups/ChunkPopUp';
39-
import { isExpired, isFileReadyToProcess } from '../utils/Utils';
4040

4141
const ConfirmationDialog = lazy(() => import('./Popups/LargeFilePopUp/ConfirmationDialog'));
4242

@@ -56,13 +56,29 @@ const Content: React.FC<ContentProps> = ({
5656
// const [init, setInit] = useState<boolean>(false);
5757
const [openGraphView, setOpenGraphView] = useState<boolean>(false);
5858
const [inspectedName, setInspectedName] = useState<string>('');
59-
const { setUserCredentials, userCredentials, connectionStatus, setConnectionStatus } = useCredentials();
59+
const [documentName, setDocumentName] = useState<string>('');
60+
const {
61+
setUserCredentials,
62+
userCredentials,
63+
connectionStatus,
64+
setConnectionStatus,
65+
isGdsActive,
66+
setGdsActive,
67+
setIsReadOnlyUser,
68+
isReadOnlyUser,
69+
} = useCredentials();
6070
const [showConfirmationModal, setshowConfirmationModal] = useState<boolean>(false);
6171
const [showExpirationModal, setshowExpirationModal] = useState<boolean>(false);
6272
const [extractLoading, setextractLoading] = useState<boolean>(false);
6373
const [retryFile, setRetryFile] = useState<string>('');
6474
const [retryLoading, setRetryLoading] = useState<boolean>(false);
6575
const [showRetryPopup, toggleRetryPopup] = useReducer((state) => !state, false);
76+
const [showChunkPopup, toggleChunkPopup] = useReducer((state) => !state, false);
77+
const [chunksLoading, toggleChunksLoading] = useReducer((state) => !state, false);
78+
const [currentPage, setCurrentPage] = useState<number>(0);
79+
const [totalPageCount, setTotalPageCount] = useState<number | null>(null);
80+
const [textChunks, setTextChunks] = useState<chunkdata[]>([]);
81+
6682
const [alertStateForRetry, setAlertStateForRetry] = useState<BannerAlertProps>({
6783
showAlert: false,
6884
alertType: 'neutral',
@@ -90,13 +106,12 @@ const Content: React.FC<ContentProps> = ({
90106
setchatModes,
91107
model,
92108
} = useFileContext();
93-
const [viewPoint, setViewPoint] = useState<'tableView' | 'showGraphView' | 'chatInfoView' | 'neighborView'>(
94-
'tableView'
95-
);
109+
const [viewPoint, setViewPoint] = useState<'tableView' | 'showGraphView' | 'chatInfoView'|'neighborView'>('tableView');
96110
const [showDeletePopUp, setshowDeletePopUp] = useState<boolean>(false);
97111
const [deleteLoading, setdeleteLoading] = useState<boolean>(false);
98112
const [searchParams] = useSearchParams();
99113

114+
100115
const { updateStatusForLargeFiles } = useServerSideEvent(
101116
(inMinutes, time, fileName) => {
102117
showNormalToast(`${fileName} will take approx ${time} ${inMinutes ? 'Min' : 'Sec'}`);
@@ -107,7 +122,12 @@ const Content: React.FC<ContentProps> = ({
107122
}
108123
);
109124
const childRef = useRef<ChildRef>(null);
110-
125+
const incrementPage = () => {
126+
setCurrentPage((prev) => prev + 1);
127+
};
128+
const decrementPage = () => {
129+
setCurrentPage((prev) => prev - 1);
130+
};
111131
useEffect(() => {
112132
if (!init && !searchParams.has('connectURL')) {
113133
let session = localStorage.getItem('neo4j.connection');
@@ -128,7 +148,13 @@ const Content: React.FC<ContentProps> = ({
128148
setOpenConnection((prev) => ({ ...prev, openPopUp: true }));
129149
}
130150
}, []);
131-
151+
useEffect(() => {
152+
if (currentPage >= 1) {
153+
(async () => {
154+
await getChunks(documentName, currentPage);
155+
})();
156+
}
157+
}, [currentPage, documentName]);
132158
useEffect(() => {
133159
setFilesData((prevfiles) => {
134160
return prevfiles.map((curfile) => {
@@ -275,7 +301,15 @@ const Content: React.FC<ContentProps> = ({
275301
}
276302
toggleChunksLoading();
277303
};
278-
304+
const getChunks = async (name: string, pageNo: number) => {
305+
toggleChunksLoading();
306+
const response = await getChunkText(userCredentials as UserCredentials, name, pageNo);
307+
setTextChunks(response.data.data.pageitems);
308+
if (!totalPageCount) {
309+
setTotalPageCount(response.data.data.total_pages);
310+
}
311+
toggleChunksLoading();
312+
};
279313
const extractData = async (uid: string, isselectedRows = false, filesTobeProcess: CustomFile[]) => {
280314
if (!isselectedRows) {
281315
const fileItem = filesData.find((f) => f.id == uid);
@@ -535,7 +569,7 @@ const Content: React.FC<ContentProps> = ({
535569
.filter((f) => f.status === 'New' || f.status == 'Reprocess');
536570
addFilesToQueue(selectedNewFiles as CustomFile[]);
537571
}
538-
}
572+
};
539573

540574
const handleOpenGraphClick = () => {
541575
const bloomUrl = process.env.VITE_BLOOM_URL;
@@ -798,6 +832,18 @@ const Content: React.FC<ContentProps> = ({
798832
view='contentView'
799833
></DeletePopUp>
800834
)}
835+
{showChunkPopup && (
836+
<ChunkPopUp
837+
chunksLoading={chunksLoading}
838+
onClose={() => toggleChunkPopup()}
839+
showChunkPopup={showChunkPopup}
840+
chunks={textChunks}
841+
incrementPage={incrementPage}
842+
decrementPage={decrementPage}
843+
currentPage={currentPage}
844+
totalPageCount={totalPageCount}
845+
></ChunkPopUp>
846+
)}
801847
{showEnhancementDialog && (
802848
<GraphEnhancementDialog
803849
open={showEnhancementDialog}
@@ -897,6 +943,17 @@ const Content: React.FC<ContentProps> = ({
897943
setRetryFile(id);
898944
toggleRetryPopup();
899945
}}
946+
onChunkView={async (name) => {
947+
setDocumentName(name);
948+
if (name != documentName) {
949+
toggleChunkPopup();
950+
if (totalPageCount) {
951+
setTotalPageCount(null);
952+
}
953+
setCurrentPage(1);
954+
// await getChunks(name, 1);
955+
}
956+
}}
900957
ref={childRef}
901958
handleGenerateGraph={processWaitingFilesOnRefresh}
902959
></FileTable>

0 commit comments

Comments
 (0)