diff --git a/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb b/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb index 8b33609..30b9551 100644 --- a/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb +++ b/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb @@ -1,31 +1,22 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "markdown", - "source": [ - "# Unstructured Partition Endpoint Quickstart" - ], "metadata": { "id": "1qjCgw6HhhQb" - } + }, + "source": [ + "# Unstructured Partition Endpoint Quickstart" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "HeGh7LCch9OY" + }, "source": [ + "> ⚠️ **Legacy**: This notebook uses the Partition Endpoint, which is now legacy. For new projects, use the [Unstructured API](https://docs.unstructured.io/api-reference/overview) instead.\n", + "\n", "This notebook shows how to use the [Unstructured Python SDK](https://docs.unstructured.io/api-reference/partition/sdk-python) to have Unstructured process a local file by using the [Unstructured Partition Endpoint](https://docs.unstructured.io/api-reference/partition/overview).\n", "\n", "---\n", @@ -50,22 +41,22 @@ "> Take your code to the next level by switching over to the [Unstructured Workflow Endpoint](https://docs.unstructured.io/api-reference/workflow/overview) for production-level scenarios, file processing in batches, files and data in remote locations, full support for [chunking](https://docs.unstructured.io/ui/chunking), generating [embeddings](https://docs.unstructured.io/ui/embedding), applying post-transform [enrichments](https://docs.unstructured.io/ui/enriching/overview), using the latest and highest-performing models, and much more. [Get started](https://docs.unstructured.io/api-reference/workflow/overview). \n", ">\n", "---" - ], - "metadata": { - "id": "HeGh7LCch9OY" - } + ] }, { "cell_type": "markdown", - "source": [ - "## Requirements" - ], "metadata": { "id": "UKeAmreTj_kn" - } + }, + "source": [ + "## Requirements" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "6UC385avkBzL" + }, "source": [ "To run this notebook, you will need:\n", "\n", @@ -99,117 +90,119 @@ "⚠️ **Warning**: Any files that you upload to these `input` or `output` folders will be deleted whenever Google Colab disconnects or resets, for example due to inactivity, manual restart, or session timeout.\n", "\n", "---\n" - ], - "metadata": { - "id": "6UC385avkBzL" - } + ] }, { "cell_type": "markdown", - "source": [ - "## Step 1: Install the Unstructured Python SDK and other dependencies" - ], "metadata": { "id": "acyxak5tn90N" - } + }, + "source": [ + "## Step 1: Install the Unstructured Python SDK and other dependencies" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "4m7807Y3oBgV" + }, "source": [ "Run the following cell to install the Unstructured Python SDK on a virtual machine (VM) in Google's cloud. This VM is associated with this notebook.\n", "\n", "This cell also installs the `nest-asyncio` Python package, which supports _nested event loops_, a code calling pattern that is used later in this notebook." - ], - "metadata": { - "id": "4m7807Y3oBgV" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install unstructured-client nest-asyncio" - ], + "execution_count": null, "metadata": { "id": "Ep1ZQ7Zrhmrr" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install unstructured-client nest-asyncio" + ] }, { "cell_type": "markdown", - "source": [ - "## Step 2: Set your Unstructured API key" - ], "metadata": { "id": "JfiooLJGoZYh" - } + }, + "source": [ + "## Step 2: Set your Unstructured API key" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "HCbYCXwEpKVr" + }, "source": [ "In the following cell, replace `` with the value of your API key, and then run the cell.\n", "\n", "As a security best practice, you would typically set this key elsewhere (for example, as an environment variable or stored in a secure key vault) and then access it programmatically here. But to keep things simple here for demonstration purposes, just specify your API key in plaintext in the following cell." - ], - "metadata": { - "id": "HCbYCXwEpKVr" - } + ] }, { "cell_type": "code", - "source": [ - "UNSTRUCTURED_API_KEY = \"\"" - ], + "execution_count": null, "metadata": { "id": "vi8a-JYzqGdA" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "UNSTRUCTURED_API_KEY = \"\"" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "UEQwI3dHkgGC" + }, "source": [ "## Step 3: Enable nested event loops\n", "\n", "Run the following cell to enable nested event loops, a code calling pattern that will be used later in Step 4." - ], - "metadata": { - "id": "UEQwI3dHkgGC" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "G4LnkAj0kss0" + }, + "outputs": [], "source": [ "import nest_asyncio\n", "\n", "nest_asyncio.apply()" - ], - "metadata": { - "id": "G4LnkAj0kss0" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Step 4: Call the Unstructured Partition Endpoint to process the files" - ], "metadata": { "id": "S4PSpyH9qsSb" - } + }, + "source": [ + "## Step 4: Call the Unstructured Partition Endpoint to process the files" + ] }, { "cell_type": "markdown", - "source": [ - "Run the following cell. If successful, new files are added to the `output` folder. It could take a few seconds to a minute or more for these new files to appear, depending on the number, size. and complexity of the files that you specified. These new files will have the same names as the filenames in the `input` folder. However, these new files' extension will be `.json`." - ], "metadata": { "id": "kCsyLMWCr32f" - } + }, + "source": [ + "Run the following cell. If successful, new files are added to the `output` folder. It could take a few seconds to a minute or more for these new files to appear, depending on the number, size. and complexity of the files that you specified. These new files will have the same names as the filenames in the `input` folder. However, these new files' extension will be `.json`." + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b0RnX0Xer805" + }, + "outputs": [], "source": [ "import asyncio\n", "import os\n", @@ -282,51 +275,60 @@ " await asyncio.gather(*tasks)\n", "\n", "await process_files()" - ], - "metadata": { - "id": "b0RnX0Xer805" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Step 5: View the results" - ], "metadata": { "id": "9z4vJUMvugME" - } + }, + "source": [ + "## Step 5: View the results" + ] }, { "cell_type": "markdown", - "source": [ - "In the **Files** pane on the left, double-click any of the new files with the extension `.json` that are within the `output` folder. A display pane appears on the right, showing the file's contents." - ], "metadata": { "id": "Ksd2oGzaujGM" - } + }, + "source": [ + "In the **Files** pane on the left, double-click any of the new files with the extension `.json` that are within the `output` folder. A display pane appears on the right, showing the file's contents." + ] }, { "cell_type": "markdown", - "source": [ - "## Learn more" - ], "metadata": { "id": "fXEtdeJiulFO" - } + }, + "source": [ + "## Learn more" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "_SSVvViQumwF" + }, "source": [ "- For a version of this notebook's code that you can run on your own local development machine, see the [Unstructured API Quickstart](https://docs.unstructured.io/api-reference/partition/quickstart).\n", "- [Unstructured Python SDK](https://docs.unstructured.io/api-reference/partition/sdk-python)\n", "- [Unstructured Partition Endpoint](https://docs.unstructured.io/api-reference/partition/overview)\n", "- [Unstructured documentation](https://docs.unstructured.io)" - ], - "metadata": { - "id": "_SSVvViQumwF" - } + ] } - ] -} \ No newline at end of file + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}