langchain-ai
diff --git a/‎cookbook/mongodb-langchain-cache-memory.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎cookbook/mongodb-langchain-cache-memory.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/how_to/document_loader_markdown.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎docs/docs/how_to/document_loader_markdown.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/integrations/document_loaders/hyperbrowser.ipynb‎
Lines changed: 221 additions & 0 deletions b/‎docs/docs/integrations/document_loaders/hyperbrowser.ipynb‎
Lines changed: 221 additions & 0 deletions
@@ -156,7 +156,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Ensure you have an HF_TOKEN in your development enviornment:\n",
+    "# Ensure you have an HF_TOKEN in your development environment:\n",
     "# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
     "\n",
     "# Load MongoDB's embedded_movies dataset from Hugging Face\n",
 
@@ -16,7 +16,7 @@
     "- Basic usage;\n",
     "- Parsing of Markdown into elements such as titles, list items, and text.\n",
     "\n",
-    "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://unstructured-io.github.io/unstructured/) package. First we install it:"
+    "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://docs.unstructured.io/welcome/) package. First we install it:"
    ]
   },
   {
 
@@ -0,0 +1,221 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# HyperbrowserLoader"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.\n",
+    "\n",
+    "Key Features:\n",
+    "- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches\n",
+    "- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright\n",
+    "- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more\n",
+    "- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies\n",
+    "\n",
+    "This notebook provides a quick overview for getting started with Hyperbrowser [document loader](https://python.langchain.com/docs/concepts/#document-loaders).\n",
+    "\n",
+    "For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).\n",
+    "\n",
+    "## Overview\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | Local | Serializable | JS support|\n",
+    "| :--- | :--- | :---: | :---: |  :---: |\n",
+    "| HyperbrowserLoader | langchain-hyperbrowser | ❌ | ❌ | ❌ | \n",
+    "### Loader features\n",
+    "| Source | Document Lazy Loading | Native Async Support |\n",
+    "| :---: | :---: | :---: | \n",
+    "| HyperbrowserLoader | ✅ | ✅ | \n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "To access Hyperbrowser document loader you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installation\n",
+    "\n",
+    "Install **langchain-hyperbrowser**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-hyperbrowser"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "Now we can instantiate our model object and load documents:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_hyperbrowser import HyperbrowserLoader\n",
+    "\n",
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://example.com\",\n",
+    "    api_key=\"YOUR_API_KEY\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Document(metadata={'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, page_content='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)')"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "docs = loader.load()\n",
+    "docs[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(docs[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Lazy Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "page = []\n",
+    "for doc in loader.lazy_load():\n",
+    "    page.append(doc)\n",
+    "    if len(page) >= 10:\n",
+    "        # do some paged operation, e.g.\n",
+    "        # index.upsert(page)\n",
+    "\n",
+    "        page = []"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced Usage\n",
+    "\n",
+    "You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://hyperbrowser.ai\", api_key=\"YOUR_API_KEY\", operation=\"crawl\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Optional params for the loader can also be provided in the `params` argument. For more information on the supported params, visit https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait or https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://example.com\",\n",
+    "    api_key=\"YOUR_API_KEY\",\n",
+    "    operation=\"scrape\",\n",
+    "    params={\"scrape_options\": {\"include_tags\": [\"h1\", \"h2\", \"p\"]}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)\n",
+    "- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)\n",
+    "- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
Original file line number	Diff line number	Diff line change
`@@ -16,7 +16,7 @@`
`16`	`16`	`"- Basic usage;\n",`
`17`	`17`	`"- Parsing of Markdown into elements such as titles, list items, and text.\n",`
`18`	`18`	`"\n",`
`19`		`- "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://unstructured-io.github.io/unstructured/) package. First we install it:"`
	`19`	`+ "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://docs.unstructured.io/welcome/) package. First we install it:"`
`20`	`20`	`]`
`21`	`21`	`},`
`22`	`22`	`{`