Any similar projects that aren't as resource-intensive? #186

endolith · 2023-06-26T14:33:44Z

endolith
Jun 26, 2023

My GPU can't run this because it doesn't have enough memory for the LLM. I can't run it in CPU-only mode either, because of a bug: #156 Even when that's fixed, I imagine it will be pretty slow.

Edit: It still doesn't work with that fixed; I just get TimeoutError while trying to download the models or DefaultCPUAllocator: not enough memory while trying to run them.

Are there any similar tools that I might be able to run?

I think I would be fine with something that uses the GPT API for the actual thinking part, and just does the document ingestion and embeddings locally.

Or even just something that doesn't think at all, and just does an embeddings-based search across many local documents. If I understand correctly, embeddings can enable searching for similar concepts, or it can be used to do Q&A type queries of the documents. Whichever it does, I would want it to return exact quote snippets for each search result from the original document anyway, both so I can read it in the original context and so I can know it's not hallucinating or misunderstanding.

Oh, it should be omnilingual, too.

Basically I want to be able to search by concept instead of by keyword:

My library of social choice theory papers (including French and German papers) [PDF, .txt, .md]
My entire work email mbox archive of ~20 yrs (including Chinese content) [mbox plain text format]
My entire folder of USB specifications and application notes [PDF, .txt, .md]
My entire folder of electronics datasheets and application notes (including Chinese documents)[PDFs. html?]
My local ebook collection [.epub, PDF, txt]
etc.

I think I can roll my own with LangChain or SentenceTransformers, but I haven't learned them yet and there are so many tools in this space, something nicer probably already exists?

endolith · 2023-07-05T15:14:43Z

endolith
Jul 5, 2023
Author

Semantra looks promising: https://github.com/freedmand/semantra

Semantra is a multipurpose tool for semantically searching documents. Query by meaning rather than just by matching text.

The tool, made to run on the command line, analyzes specified text and PDF files on your computer and launches a local web search application for interactively querying them. The purpose of Semantra is to make running a specialized semantic search engine easy, friendly, configurable, and private/secure.

Semantra is built for individuals seeking needles in haystacks — journalists sifting through leaked documents on deadline, researchers seeking insights within papers, students engaging with literature by querying themes, historians connecting events across books, and so forth.

Can it use ChatGPT?

No, and this is by design.

Semantra does not use any generative models like ChatGPT. It is built only to query text semantically without any layers on top to attempt explaining, summarizing, or synthesizing results. Generative language models occasionally produce outwardly plausible but ultimately incorrect information, placing the burden of verification on the user. Semantra treats primary source material as the only source of truth and endeavors to show that a human-in-the-loop search experience on top of simpler embedding models is more serviceable to users.

2 replies

molnarj Jul 10, 2023

@endolith maybe this: https://github.com/su77ungr/CASALIOY (but I have no experience with it)

endolith Jul 11, 2023
Author

@molnarj Thanks. That helped me find this list https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Any similar projects that aren't as resource-intensive? #186

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Any similar projects that aren't as resource-intensive? #186

Uh oh!

Uh oh!

endolith Jun 26, 2023

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

endolith Jul 5, 2023 Author

Uh oh!

molnarj Jul 10, 2023

Uh oh!

endolith Jul 11, 2023 Author

endolith
Jun 26, 2023

Replies: 1 comment 2 replies

endolith
Jul 5, 2023
Author

endolith Jul 11, 2023
Author