Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions document-loaders/langchain4j-community-document-loader-exa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# LangChain4j Exa Document Loader

This module provides integration with [Exa.ai](https://exa.ai/) for loading web content as documents in LangChain4j.

## Overview

Exa.ai is a semantic search API that provides high-quality, relevant results from across the web.
The `ExaDocumentLoader` allows you to search and retrieve content via Exa's API and convert the results into LangChain4j `Document` objects, including structured metadata.

## Features

- Search the web using Exa.ai's semantic search API
- Retrieve full text content from search results (optional)
- Automatic metadata extraction:
- Title
- URL
- Author
- Published date
- Relevance score
- Configurable number of search results
- Support for different search types via `ExaSearchType`
- Graceful fallback to highlights or title if full text is unavailable
- Rich metadata suitable for downstream processing

## Installation

Add the following dependency to your `pom.xml`:

```xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-document-loader-exa</artifactId>
</dependency>
```

## Usage

### Basic Usage

```java
import dev.langchain4j.community.data.document.loader.exa.ExaDocumentLoader;
import dev.langchain4j.data.document.Document;
import java.util.List;

ExaDocumentLoader loader = ExaDocumentLoader.builder()
.apiKey("your-exa-api-key")
.numResults(5)
.build();

List<Document> documents = loader.loadDocuments("artificial intelligence trends");

for (Document document : documents) {
System.out.println("Title: " + document.metadata().getString("title"));
System.out.println("URL: " + document.metadata().getString("url"));
System.out.println("Author: " + document.metadata().getString("author"));
System.out.println("Published Date: " + document.metadata().getString("published_date"));
System.out.println("Score: " + document.metadata().getDouble("score"));
System.out.println("Content: " + document.text());
System.out.println("---");
}
```

### Advanced Configuration

```java
import dev.langchain4j.community.data.document.loader.exa.ExaDocumentLoader;
import dev.langchain4j.http.client.HttpClient;
import dev.langchain4j.http.client.jdk.JdkHttpClientBuilder;
import com.fasterxml.jackson.databind.ObjectMapper;

HttpClient customHttpClient = new JdkHttpClientBuilder().build();
ObjectMapper customMapper = new ObjectMapper();

ExaDocumentLoader loader = ExaDocumentLoader.builder()
.apiKey("your-exa-api-key")
.numResults(5)
.searchType(ExaSearchType.AUTO)
.includeText(true)
.httpClient(customHttpClient)
.objectMapper(customMapper)
.build();

List<Document> documents = loader.loadDocuments("latest machine learning papers");
```

## Metadata

**Field Descriptions:**

- **title**: The title of the web page
- **url**: The URL of the web page
- **author**: Author of the content
- **published_date**: Publication date
- **score**: Relevance score from Exa

## Error Handling

All API and parsing errors are wrapped in `ExaDocumentLoaderException` to provide a consistent exception model.

## API Key

You need an Exa.ai API key to use this loader. You can obtain one by signing up at [https://exa.ai/](https://exa.ai/).

64 changes: 64 additions & 0 deletions document-loaders/langchain4j-community-document-loader-exa/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community</artifactId>
<version>1.13.0-beta23-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>langchain4j-community-document-loader-exa</artifactId>
<packaging>jar</packaging>
<name>LangChain4j :: Community :: Document loader :: Exa</name>

<licenses>
<license>
<name>Apache-2.0</name>
<url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
<comments>A business-friendly OSS license</comments>
</license>
</licenses>

<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>${langchain4j.core.version}</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-http-client-jdk</artifactId>
<version>${langchain4j.http-client-jdk.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<!-- test dependencies -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-http-client</artifactId>
<version>${langchain4j.http-client-jdk.version}</version>
<classifier>tests</classifier>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Loading
Loading