Llama parse example update (#280)

erichare · web-flow · commit 4761b3f2bbf8 · 2024-02-09T20:41:22.000-05:00
* Add llama-parse notebook example

* Remove long node print from llama parse notebook

* Update llama-parse-astra.ipynb
diff --git a/examples/notebooks/llama-parse-astra.ipynb b/examples/notebooks/llama-parse-astra.ipynb
@@ -7,6 +7,13 @@
     "# Using llama-parse with AstraDB"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we show a basic RAG-style example that uses `llama-parse` to parse a PDF document, store the corresponding document into a vector store (`AstraDB`) and finally, perform some basic queries against that store. The notebook is modeled after the quick start notebooks and hence is meant as a way of getting started with `llama-parse`, backed by a vector database."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -16,7 +23,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -33,7 +40,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -54,7 +61,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -73,7 +80,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -108,14 +115,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Started parsing the file under job_id 487090ac-1e6a-498a-a220-518eaedbb985\n"
+      "Started parsing the file under job_id ba637beb-157c-4082-9f83-621d97fec6f8\n"
      ]
     }
    ],
@@ -125,6 +132,27 @@
     "documents = LlamaParse(result_type=\"text\").load_data(\"./attention.pdf\")"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'                                                    (shifted right)\\n                           Figure 1: The Transformer - model architecture.\\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\\nconnected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,\\nrespectively.\\n3.1   Encoder and Decoder Stacks\\nEncoder:     The encoder is composed of a stack of N = 6 identical layers. Each layer has two\\nsub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-\\nwise fully connected feed-forward network. We employ a residual connection [11] around each of\\nthe two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer is\\nLayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer\\nitself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding\\nlayers, produce outputs of'"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Take a quick look at some of the parsed text from the document:\n",
+    "documents[0].get_content()[10000:11000]"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -134,7 +162,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -150,7 +178,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -164,7 +192,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -195,7 +223,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -204,7 +232,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 15,
    "metadata": {},
    "outputs": [
     {
@@ -213,7 +241,7 @@
      "text": [
       "\n",
       "***********New LlamaParse+ Basic Query Engine***********\n",
-      "Multi-Head Attention is also known as parallel attention layers.\n"
+      "Multi-Head Attention is also known as an attention function that linearly projects the queries, keys and values multiple times with different, learned linear projections to different dimensions. This allows the attention function to be performed in parallel on each of these projected versions of queries, keys and values.\n"
      ]
     }
    ],
@@ -227,7 +255,28 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'We used beam search as described in the previous section, but no\\ncheckpoint averaging. We present these results in Table 3.\\nIn Table 3 rows (A), we vary the number of attention heads and the attention key and value dimensions,\\nkeeping the amount of computation constant, as described in Section 3.2.2. While single-head\\nattention is 0.9 BLEU worse than the best setting, quality also drops off with too many heads.\\nIn Table 3 rows (B), we observe that reducing the attention key size dk hurts model quality. This\\nsuggests that determining compatibility is not easy and that a more sophisticated compatibility\\nfunction than dot product may be beneficial. We further observe in rows (C) and (D) that, as expected,\\nbigger models are better, and dropout is very helpful in avoiding over-fitting. In row (E) we replace our\\nsinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical\\nresults to the base model.\\n6.3    English Constituency Parsing\\nTo evaluate if the Transformer can generalize to other tasks we performed experiments on English\\nconstituency parsing. This task presents specific challenges: the output is subject to strong structural\\nconstraints and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\\nmodels have not been able to attain state-of-the-art results in small-data regimes [37].\\nWe trained a 4-layer transformer with dmodel = 1024 on the Wall Street Journal (WSJ) portion of the\\nPenn Treebank [25], about 40K training sentences. We also trained it in a semi-supervised setting,\\nusing the larger high-confidence and BerkleyParser corpora from with approximately 17M sentences\\n[37]. We used a vocabulary of 16K tokens for the WSJ only setting and a vocabulary of 32K tokens\\nfor the semi-supervised setting.\\nWe performed only a small number of experiments to select the dropout, both attention and residual\\n(section 5.4), learning rates and beam size on the Section 22 development set, all other parameters\\nremained unchanged from the English-to-German base translation model. During inference, we\\n                                                             9\\n---\\nTable 4: The Transformer generalizes well to English constituency parsing (Results are on Section 23\\nof WSJ)\\n                          Parser                           Training             WSJ 23 F1\\n            Vinyals & Kaiser el al. (2014) [37]    WSJ only, discriminative         88.3\\n                 Petrov et al. (2006) [29]         WSJ only, discriminative         90.4\\n                   Zhu et al. (2013) [40]          WSJ only, discriminative         90.4\\n                   Dyer et al. (2016) [8]          WSJ only, discriminative         91.7\\n                  Transformer (4 layers)           WSJ only, discriminative         91.3\\n                   Zhu et al. (2013) [40]              semi-supervised              91.3\\n               Huang & Harper (2009) [14]              semi-supervised              91.3\\n               McClosky et al. (2006) [26]             semi-supervised              92.1\\n            Vinyals & Kaiser el al. (2014) [37]        semi-supervised              92.1\\n                  Transformer (4 layers)               semi-supervised              92.7\\n                 Luong et al. (2015) [23]                  multi-task               93.0\\n                   Dyer et al. (2016) [8]                  generative               93.3\\nincreased the maximum output length to input length + 300. We used a beam size of 21 and α = 0.3\\nfor both WSJ only and the semi-supervised setting.\\nOur results in Table 4 show that despite the lack of task-specific tuning our model performs sur-\\nprisingly well, yielding better results than all previously reported models with the exception of the\\nRecurrent Neural Network Grammar [8].\\nIn contrast to RNN sequence-to-sequence models [37], the Transformer outperforms the Berkeley-\\nParser [29] even when training only on the WSJ training set of 40K sentences.\\n7    Conclusion\\nIn this work, we presented the Transformer, the first sequence transduction model based entirely on\\nattention, replacing the recurrent layers most commonly used in encoder-decoder architectures with\\nmulti-headed self-attention.\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based\\non recurrent or convolutional layers.'"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Take a look at one of the source nodes from the response\n",
+    "response_1.source_nodes[0].get_content()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {