[Docs] HCD and DSE with RAGStack (#582)

mendonk · web-flow · commit 4b940f0bbf0e · 2024-07-10T18:22:58.000-07:00
* initial-content

* add-langchain-hub

* dse-69-example

* typo
diff --git a/docs/modules/examples/pages/dse-69.adoc b/docs/modules/examples/pages/dse-69.adoc
@@ -0,0 +1,36 @@
+= RAGStack and DataStax Enterprise (DSE) 6.9 example
+
+. Pull the latest dse-server Docker image and confirm the container is in a running state.
++
+[source,bash]
+----
+docker pull datastax/dse-server:6.9.0-rc.2
+docker run -e DS_LICENSE=accept -p 9042:9042 -d datastax/dse-server:6.9.0-rc.2
+----
++
+. Install dependencies.
++
+[source,bash]
+----
+pip install ragstack-ai-langchain python-dotenv langchainhub
+----
++
+. Create a `.env` file in the root directory of the project and add the following environment variables.
++
+[source,bash]
+----
+OPENAI_API_KEY="sk-..."
+----
++
+. Create a Python script to embed and generate the results of a query.
++
+include::examples:partial$hcd-quickstart.adoc[]
++
+You should see output like this:
++
+[source,plain]
+----
+Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought and Tree of Thoughts help models decompose hard tasks and enhance performance by thinking step by step. This process allows for a better interpretation of the model's thinking process and can involve various methods such as simple prompting, task-specific instructions, or human inputs.
+----
+
+
diff --git a/docs/modules/examples/pages/hcd.adoc b/docs/modules/examples/pages/hcd.adoc
@@ -0,0 +1,44 @@
+= RAGStack and Hyper Converged Database (HCD) example
+
+. Clone the HCD example repository.
++
+[source,bash]
+----
+git clone git@github.com:datastax/astra-db-java.git
+cd astra-db-java
+----
++
+. Build the Docker image and confirm the containers are in a running state.
++
+[source,bash]
+----
+docker compose up -d
+docker compose ps
+----
++
+. Install dependencies.
++
+[source,bash]
+----
+pip install ragstack-ai-langchain python-dotenv langchainhub
+----
++
+. Create a `.env` file in the root directory of the project and add the following environment variables.
++
+[source,bash]
+----
+OPENAI_API_KEY="sk-..."
+----
++
+. Create a Python script to embed and generate the results.
++
+include::examples:partial$hcd-quickstart.adoc[]
++
+You should see output like this:
++
+[source,plain]
+----
+Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought and Tree of Thoughts help models decompose hard tasks and enhance performance by thinking step by step. This process allows for a better interpretation of the model's thinking process and can involve various methods such as simple prompting, task-specific instructions, or human inputs.
+----
+
+
diff --git a/docs/modules/examples/partials/hcd-quickstart.adoc b/docs/modules/examples/partials/hcd-quickstart.adoc
@@ -0,0 +1,69 @@
+.Python
+[%collapsible%open]
+====
+[source,python]
+----
+import os
+from dotenv import load_dotenv
+import bs4
+from langchain import hub
+from langchain_openai import ChatOpenAI, OpenAIEmbeddings
+from langchain_community.document_loaders import WebBaseLoader
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.runnables import RunnablePassthrough
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+import cassio
+from cassio.table import MetadataVectorCassandraTable
+from langchain_community.vectorstores import Cassandra
+
+# Load environment variables
+load_dotenv()
+openai_api_key = os.getenv("OPENAI_API_KEY")
+
+# Initialize Cassandra
+cassio.init(contact_points=['localhost'], username='cassandra', password='cassandra')
+cassio.config.resolve_session().execute(
+    "create keyspace if not exists my_vector_keyspace with replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};"
+)
+
+# Create metadata Vector Cassandra Table
+mvct = MetadataVectorCassandraTable(table='my_vector_table', vector_dimension=1536, keyspace='my_vector_keyspace')
+
+# Web loader configuration
+loader = WebBaseLoader(
+    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
+    bs_kwargs=dict(
+        parse_only=bs4.SoupStrainer(
+            class_=("post-content", "post-title", "post-header")
+        )
+    ),
+)
+docs = loader.load()
+
+# Document splitting
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
+splits = text_splitter.split_documents(docs)
+
+# Vector store setup
+vectorstore = Cassandra.from_documents(documents=splits, embedding=OpenAIEmbeddings(), table_name='my_vector_table', keyspace='my_vector_keyspace', vector_dimension=1024)
+retriever = vectorstore.as_retriever()
+
+# Language model setup
+llm = ChatOpenAI()
+
+# Chain components
+def format_docs(docs):
+    return "\n\n".join(doc.page_content for doc in docs)
+
+rag_chain = (
+    {"context": retriever | format_docs, "question": RunnablePassthrough()}
+    | hub.pull("rlm/rag-prompt")
+    | llm
+    | StrOutputParser()
+)
+
+# Invocation
+result = rag_chain.invoke("What is Task Decomposition?")
+print(result)
+----
+====