Skip to content

Commit 3845620

Browse files
authored
Outlook v2 source connector (#286)
1 parent 60b9120 commit 3845620

File tree

11 files changed

+141
-97
lines changed

11 files changed

+141
-97
lines changed

api-reference/ingest/source-connectors/outlook.mdx

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,24 @@
22
title: Outlook
33
---
44

5-
import SharedContentOutlook from '/snippets/sc-shared-text/outlook.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentOutlook from '/snippets/sc-shared-text/outlook-cli-api.mdx';
10+
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
611

712
<SharedContentOutlook/>
13+
<SharedAPIKeyURL/>
814

9-
Make sure to set the `--partition-by-api` flag and pass in your API key with `--api-key`:
15+
Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:
1016

11-
import OutlookAPISh from '/snippets/source_connectors/outlook_api.sh.mdx';
12-
import OutlookAPIPy from '/snippets/source_connectors/outlook_api.py.mdx';
17+
import OutlookAPISh from '/snippets/source_connectors/outlook.sh.mdx';
18+
import OutlookAPIPyV2 from '/snippets/source_connectors/outlook.v2.py.mdx';
19+
import OutlookAPIPyV1 from '/snippets/source_connectors/outlook.v1.py.mdx';
1320

1421
<CodeGroup>
15-
1622
<OutlookAPISh />
17-
18-
<OutlookAPIPy />
19-
20-
</CodeGroup>
21-
22-
Additionally, if you're using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API
23-
deployed on Azure or AWS, you also need to specify the API URL via the `--partition-endpoint` argument.
23+
<OutlookAPIPyV2 />
24+
<OutlookAPIPyV1 />
25+
</CodeGroup>

open-source/ingest/source-connectors/outlook.mdx

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,29 @@
22
title: Outlook
33
---
44

5-
import SharedContentOutlook from '/snippets/sc-shared-text/outlook.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentOutlook from '/snippets/sc-shared-text/outlook-cli-api.mdx';
610

711
<SharedContentOutlook/>
812

13+
Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector:
14+
15+
This example sends data to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.
16+
917
import OutlookSh from '/snippets/source_connectors/outlook.sh.mdx';
10-
import OutlookPy from '/snippets/source_connectors/outlook.py.mdx';
18+
import OutlookPyV2 from '/snippets/source_connectors/outlook.v2.py.mdx';
19+
import OutlookPyV1 from '/snippets/source_connectors/outlook.v1.py.mdx';
1120

1221
<CodeGroup>
13-
1422
<OutlookSh />
23+
<OutlookPyV2 />
24+
<OutlookPyV1 />
25+
</CodeGroup>
1526

16-
<OutlookPy />
27+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
1728

18-
</CodeGroup>
29+
<SharedPartitionByAPIOSS/>
1930

20-
For a full list of the options the Unstructured Ingest CLI accepts check `unstructured-ingest outlook --help`.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
The Outlook connector dependencies:
2+
3+
```bash CLI, Python
4+
pip install "unstructured-ingest[outlook]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
The following environment variables:
12+
13+
- `OUTLOOK_USER_EMAIL` - The Outlook user's email address, represented by `--user-email` (CLI) or `user_mail` (Python).
14+
- `OUTLOOK_APP_CLIENT_ID` - The application (client) ID of the Microsoft Entra ID app registration that has access to the user's email account, represented by `--client-id` (CLI) or `client_id` (Python).
15+
- `OUTLOOK_APP_CLIENT_SECRET` - The client secret for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
16+
- `OUTLOOK_APP_TENANT` - The directory (tenant) ID of the Entra ID app registration, represented by `--tenant` (CLI) or `tenant` (Python).
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
The Outlook prerequisites:
2+
3+
<iframe
4+
width="560"
5+
height="315"
6+
src="https://www.youtube.com/embed/9yESRp9pzv0"
7+
title="YouTube video player"
8+
frameborder="0"
9+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
10+
allowfullscreen
11+
></iframe>
12+
13+
- The Outlook user's email address.
14+
- A Microsoft Entra ID app registration in the same Azure account as the Outlook account. You will need
15+
this app registration's client (application) ID, client secret, and directory (tenant) ID. [Learn how](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app).
16+
- The Entra ID app registration must have the following Graph API permission levels of the application (not delegated) type:
17+
18+
- `Mail.Read`
19+
- `Mail.ReadBasic`
20+
- `User.Read.All`
21+
22+
[Learn how](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions).
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Connect Outlook to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.
2+
3+
You will need:
4+
5+
import SharedOutlook from '/snippets/general-shared-text/outlook.mdx';
6+
import SharedOutlookCLIAPI from '/snippets/general-shared-text/outlook-cli-api.mdx';
7+
8+
<SharedOutlook />
9+
<SharedOutlookCLIAPI />

snippets/sc-shared-text/outlook.mdx

Lines changed: 0 additions & 14 deletions
This file was deleted.

snippets/source_connectors/outlook.py.mdx

Lines changed: 0 additions & 31 deletions
This file was deleted.
Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
1-
```bash Shell
1+
```bash CLI
22
#!/usr/bin/env bash
33

4+
# Embedding is optional.
5+
46
unstructured-ingest \
57
outlook \
6-
--client-id $MS_CLIENT_ID \
7-
--client-cred $MS_CLIENT_CRED \
8-
--tenant $MS_TENANT_ID \
9-
--user-email $MS_USER_EMAIL \
10-
--outlook-folders Inbox,"Sent Items" \
8+
--user-email $OUTLOOK_USER_EMAIL \
9+
--outlook-folders Inbox \
1110
--output-dir $LOCAL_FILE_OUTPUT_DIR \
12-
--num-processes 2 \
13-
--recursive \
14-
--verbose \
15-
--strategy hi_res
11+
--client-id $OUTLOOK_APP_CLIENT_ID \
12+
--client-cred $OUTLOOK_APP_CLIENT_SECRET \
13+
--tenant $OUTLOOK_APP_TENANT \
14+
--partition-by-api \
15+
--api-key $UNSTRUCTURED_API_KEY \
16+
--partition-endpoint $UNSTRUCTURED_API_URL \
17+
--chunking-strategy by_title \
18+
--embedding-provider huggingface
1619
```

snippets/source_connectors/outlook_api.py.mdx renamed to snippets/source_connectors/outlook.v1.py.mdx

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
```python Python
1+
```python Python Ingest v1
22
import os
33

44
from unstructured_ingest.connector.outlook import OutlookAccessConfig, SimpleOutlookConfig
@@ -16,16 +16,17 @@ if __name__ == "__main__":
1616
partition_config=PartitionConfig(
1717
partition_by_api=True,
1818
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
19+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
1920
strategy="hi_res",
2021
),
2122
connector_config=SimpleOutlookConfig(
2223
access_config=OutlookAccessConfig(
23-
client_credential=os.getenv("MS_CLIENT_CRED"),
24+
client_credential=os.getenv("OUTLOOK_APP_CLIENT_SECRET"),
2425
),
25-
client_id=os.getenv("MS_CLIENT_ID"),
26-
tenant=os.getenv("MS_TENANT_ID"),
27-
user_email=os.getenv("MS_USER_EMAIL"),
28-
outlook_folders=["Inbox", "Sent Items"],
26+
client_id=os.getenv("OUTLOOK_APP_CLIENT_ID"),
27+
tenant=os.getenv("OUTLOOK_APP_TENANT"),
28+
user_email=os.getenv("OUTLOOK_USER_EMAIL"),
29+
outlook_folders=["Inbox"],
2930
recursive=True,
3031
),
3132
)
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
```python Python Ingest v2
2+
import os
3+
4+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
5+
from unstructured_ingest.v2.interfaces import ProcessorConfig
6+
7+
from unstructured_ingest.v2.processes.connectors.outlook import (
8+
OutlookIndexerConfig,
9+
OutlookDownloaderConfig,
10+
OutlookConnectionConfig,
11+
OutlookAccessConfig
12+
)
13+
14+
from unstructured_ingest.v2.processes.connectors.local import LocalUploaderConfig
15+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
16+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
17+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
18+
19+
# Embedding is optional.
20+
21+
if __name__ == "__main__":
22+
Pipeline.from_configs(
23+
context=ProcessorConfig(),
24+
indexer_config=OutlookIndexerConfig(
25+
outlook_folders=["Inbox"],
26+
recursive=False,
27+
user_email=os.getenv("OUTLOOK_USER_EMAIL")
28+
),
29+
downloader_config=OutlookDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
30+
source_connection_config=OutlookConnectionConfig(
31+
access_config=OutlookAccessConfig(client_cred=os.getenv("OUTLOOK_APP_CLIENT_SECRET")),
32+
client_id=os.getenv("OUTLOOK_APP_CLIENT_ID"),
33+
tenant=os.getenv("OUTLOOK_APP_TENANT")
34+
),
35+
partitioner_config=PartitionerConfig(
36+
partition_by_api=True,
37+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
38+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
39+
),
40+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
41+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
42+
uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
43+
).run()
44+
```

0 commit comments

Comments
 (0)