|
| 1 | +# Example langchain retriever |
| 2 | + |
| 3 | +This project demonstrates one approach for implementing a |
| 4 | +[langchain retriever](https://python.langchain.com/docs/modules/data_connection/) |
| 5 | +that allows for |
| 6 | +[Retrieval Augmented Generation (RAG)](https://python.langchain.com/docs/use_cases/question_answering/) |
| 7 | +to be supported via MarkLogic and the MarkLogic Python Client. This example uses the same data as in |
| 8 | +[the langchain RAG quickstart guide](https://python.langchain.com/docs/use_cases/question_answering/quickstart), |
| 9 | +but with the data having first been loaded into MarkLogic. |
| 10 | + |
| 11 | +**This is only intended as an example** of how easily a langchain retriever can be developed |
| 12 | +using the MarkLogic Python Client. The queries in this example are simple and naturally |
| 13 | +do not have any knowledge of how your data is modeled in MarkLogic. You are encouraged to use |
| 14 | +this as an example for developing your own retriever, where you can build a query based on a |
| 15 | +question submitted to langchain that fully leverages the indexes and data models in your MarkLogic |
| 16 | +application. Additionally, please see the |
| 17 | +[langchain documentation on splitting text](https://python.langchain.com/docs/modules/data_connection/document_transformers/). You may need to restructure your data so that you have a larger number of |
| 18 | +smaller documents in your database so that you do not exceed the limit that langchain imposes on how |
| 19 | +much data a retriever can return. |
| 20 | + |
| 21 | +# Setup |
| 22 | + |
| 23 | +To try out this project, use [docker-compose](https://docs.docker.com/compose/) to instantiate a new MarkLogic |
| 24 | +instance with port 8003 available (you can use your own MarkLogic instance too, just be sure that port 8003 |
| 25 | +is available): |
| 26 | + |
| 27 | + docker-compose up -d --build |
| 28 | + |
| 29 | +Then deploy a small REST API application to MarkLogic, which includes a basic non-admin MarkLogic user |
| 30 | +named `langchain-user`: |
| 31 | + |
| 32 | + ./gradlew -i mlDeploy |
| 33 | + |
| 34 | +Next, create a new Python virtual environment - [pyenv](https://github.com/pyenv/pyenv) is recommended for this - |
| 35 | +and install the |
| 36 | +[langchain example dependencies](https://python.langchain.com/docs/use_cases/question_answering/quickstart#dependencies), |
| 37 | +along with the MarkLogic Python Client: |
| 38 | + |
| 39 | + pip install -U langchain langchain_openai langchain-community langchainhub openai chromadb bs4 marklogic_python_client |
| 40 | + |
| 41 | +Then run the following Python program to load text data from the langchain quickstart guide |
| 42 | +into two different collections in the `langchain-test-content` database: |
| 43 | + |
| 44 | + python load_data.py |
| 45 | + |
| 46 | +Create a ".env" file to hold your OpenAI API key: |
| 47 | + |
| 48 | + echo "OPENAI_API_KEY=<your key here>" > .env |
| 49 | + |
| 50 | +# Testing the retriever |
| 51 | + |
| 52 | +You are now ready to test the example retriever. Run the following to ask a question with the |
| 53 | +results augmented via the `marklogic_retriever.py` module in this project; you will be |
| 54 | +prompted for an OpenAI API key when you run this, which you can type or paste in: |
| 55 | + |
| 56 | + python ask.py "What is task decomposition?" posts |
| 57 | + |
| 58 | +The retriever uses a [cts.similarQuery](https://docs.marklogic.com/cts.similarQuery) to select from the documents |
| 59 | +loaded via `load_data.py`. It defaults to a page length of 10. You can change this by providing a command line |
| 60 | +argument - e.g.: |
| 61 | + |
| 62 | + python ask.py "What is task decomposition?" posts 15 |
| 63 | + |
| 64 | +Example of a question for the "sotu" (State of the Union speech) collection: |
| 65 | + |
| 66 | + python ask.py "What are economic sanctions?" sotu 20 |
| 67 | + |
| 68 | +To use a word query instead of a similar query, along with a set of drop words, specify "word" as the 4th argument: |
| 69 | + |
| 70 | + python ask.py "What are economic sanctions?" sotu 20 word |
0 commit comments