Skip to content

Commit 4d8f02c

Browse files
authored
Slack v2 API source connector (#298)
1 parent b7c0e81 commit 4d8f02c

File tree

11 files changed

+145
-93
lines changed

11 files changed

+145
-93
lines changed

api-reference/ingest/source-connectors/slack.mdx

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,24 @@
22
title: Slack
33
---
44

5-
import SharedContentSlack from '/snippets/sc-shared-text/slack.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentSlack from '/snippets/sc-shared-text/slack-cli-api.mdx';
10+
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
611

712
<SharedContentSlack/>
13+
<SharedAPIKeyURL/>
814

9-
Make sure to set the `--partition-by-api` flag and pass in your API key with `--api-key`:
15+
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:
1016

11-
import SlackAPISh from '/snippets/source_connectors/slack_api.sh.mdx';
12-
import SlackAPIPy from '/snippets/source_connectors/slack_api.py.mdx';
17+
import SlackAPISh from '/snippets/source_connectors/slack.sh.mdx';
18+
import SlackAPIPyV2 from '/snippets/source_connectors/slack.v2.py.mdx';
19+
import SlackAPIPyV1 from '/snippets/source_connectors/slack.v1.py.mdx';
1320

1421
<CodeGroup>
15-
1622
<SlackAPISh />
17-
18-
<SlackAPIPy />
19-
20-
</CodeGroup>
21-
22-
Additionally, if you're using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API
23-
deployed on Azure or AWS, you also need to specify the API URL via the `--partition-endpoint` argument.
23+
<SlackAPIPyV2 />
24+
<SlackAPIPyV1 />
25+
</CodeGroup>

open-source/ingest/source-connectors/slack.mdx

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,28 @@
22
title: Slack
33
---
44

5-
import SharedContentSlack from '/snippets/sc-shared-text/slack.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentSlack from '/snippets/sc-shared-text/slack-cli-api.mdx';
610

711
<SharedContentSlack/>
812

13+
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.
14+
15+
This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.
16+
917
import SlackSh from '/snippets/source_connectors/slack.sh.mdx';
10-
import SlackPy from '/snippets/source_connectors/slack.py.mdx';
18+
import SlackPyV2 from '/snippets/source_connectors/slack.v2.py.mdx';
19+
import SlackPyV1 from '/snippets/source_connectors/slack.v1.py.mdx';
1120

1221
<CodeGroup>
13-
1422
<SlackSh />
15-
16-
<SlackPy />
17-
23+
<SlackPyV2 />
24+
<SlackPyV1 />
1825
</CodeGroup>
1926

20-
For a full list of the options the Unstructured Ingest CLI accepts check `unstructured-ingest slack --help`.
27+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
28+
29+
<SharedPartitionByAPIOSS/>
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
The Slack connector dependencies:
2+
3+
```bash
4+
pip install "unstructured-ingest[slack]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
These environment variables:
12+
13+
- `SLACK_BOT_USER_OAUTH_TOKEN` - The OAuth token for the Slack app, represented by `--token` (CLI) or `token` (Python).
14+
15+
To specify the starting and ending date and time range for the channels to be processed:
16+
17+
- For the CLI, use one of the following supported formats:
18+
19+
- `YYYY-MM-DD`
20+
- `YYYY-MM-DDTHH:MM:SS`
21+
- `YYYY-MM-DDTHH:MM:SSZ`
22+
- `YYYY-MM-DD+HH:MM:SS`
23+
- `YYYY-MM-DD-HH:MM:SS`
24+
25+
- For Python, use the `datetime.datetime` function.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
The Slack prerequisites:
2+
3+
- A Slack app. Create a Slack app by following [Step 1: Creating an app](https://api.slack.com/quickstart#creating).
4+
- The app must have the `channels:history` OAuth scope. Give the app this scope by following [Step 2: Requesting scopes](https://api.slack.com/quickstart#scopes).
5+
- The app must be installed and authorized for the target Slack workspace. Install and authorize the app by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).
6+
- The app's access token. Get this token by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).
7+
- Add the app to the target channels in the Slack workspace. To do this from the channel, open the channel's details page, click the **Integrations** tab, click **Add apps**, and follow the on-screen directions to install the app.
8+
- The channel ID for each target channel. To get this ID, open the channel's details page, and look for the **Channel ID** field on the **About** tab.
9+
- The starting and ending date and time range for the channels to be processed. Supported formats include:
10+
11+
- `YYYY-MM-DD`
12+
- `YYYY-MM-DDTHH:MM:SS`
13+
- `YYYY-MM-DDTHH:MM:SSZ`
14+
- `YYYY-MM-DD+HH:MM:SS`
15+
- `YYYY-MM-DD-HH:MM:SS`
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Connect Slack to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.
2+
3+
You will need:
4+
5+
import SharedSlack from '/snippets/general-shared-text/slack.mdx';
6+
import SharedSlackCLIAPI from '/snippets/general-shared-text/slack-cli-api.mdx';
7+
8+
<SharedSlack />
9+
<SharedSlackCLIAPI />

snippets/sc-shared-text/slack.mdx

Lines changed: 0 additions & 17 deletions
This file was deleted.
Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,19 @@
1-
```bash Shell
1+
```bash CLI
22
#!/usr/bin/env bash
33

4+
# Chunking and embedding are optional.
5+
46
unstructured-ingest \
57
slack \
6-
--channels 12345678 \
7-
--token 12345678 \
8+
--token $SLACK_BOT_USER_OAUTH_TOKEN \
9+
--channels C03FVNHR70A,C03FVNRG43D \
10+
--start-date 2024-10-22 \
11+
--end-date 2024-10-23 \
812
--download-dir $LOCAL_FILE_DOWNLOAD_DIR \
13+
--chunking-strategy by_title \
14+
--embedding-provider huggingface \
915
--output-dir $LOCAL_FILE_OUTPUT_DIR \
10-
--start-date 2023-04-01T01:00:00-08:00 \
11-
--end-date 2023-04-02 \
12-
--strategy hi_res
16+
--partition-by-api \
17+
--api-key $UNSTRUCTURED_API_KEY \
18+
--partition-endpoint $UNSTRUCTURED_API_URL
1319
```

snippets/source_connectors/slack.py.mdx renamed to snippets/source_connectors/slack.v1.py.mdx

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1-
```python Python
1+
```python Python Ingest v1
22
import os
3+
from datetime import datetime
34

45
from unstructured_ingest.connector.slack import SimpleSlackConfig, SlackAccessConfig
56
from unstructured_ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
@@ -18,11 +19,11 @@ if __name__ == "__main__":
1819
),
1920
connector_config=SimpleSlackConfig(
2021
access_config=SlackAccessConfig(
21-
token=os.getenv("SLACK_TOKEN"),
22+
token=os.getenv("SLACK_BOT_USER_OAUTH_TOKEN"),
2223
),
23-
channels=["12345678"],
24-
start_date="2023-04-01T01:00:00-08:00",
25-
end_date="2023-04-02,",
24+
channels=["C03FVNHR70A", "C03FVNRG43D"],
25+
start_date=datetime(year=2024, month=10, day=22),
26+
end_date=datetime(year=2024, month=10, day=23)
2627
),
2728
)
2829
runner.run()
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
```python Python Ingest v2
2+
import os
3+
from datetime import datetime
4+
5+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
6+
from unstructured_ingest.v2.interfaces import ProcessorConfig
7+
8+
from unstructured_ingest.v2.processes.connectors.slack import (
9+
SlackIndexerConfig,
10+
SlackDownloaderConfig,
11+
SlackConnectionConfig,
12+
SlackAccessConfig
13+
)
14+
15+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
16+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
17+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
18+
from unstructured_ingest.v2.processes.connectors.local import LocalUploaderConfig
19+
20+
# Chunking and embedding are optional.
21+
22+
if __name__ == "__main__":
23+
Pipeline.from_configs(
24+
context=ProcessorConfig(),
25+
indexer_config=SlackIndexerConfig(
26+
channels=["C03FVNHR70A", "C03FVNRG43D"],
27+
start_date=datetime(year=2024, month=10, day=22),
28+
end_date=datetime(year=2024, month=10, day=23)
29+
),
30+
downloader_config=SlackDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
31+
source_connection_config=SlackConnectionConfig(
32+
access_config=SlackAccessConfig(token=os.getenv("SLACK_BOT_USER_OAUTH_TOKEN"))
33+
),
34+
partitioner_config=PartitionerConfig(
35+
partition_by_api=True,
36+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
37+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
38+
additional_partition_args={
39+
"split_pdf_page": True,
40+
"split_pdf_allow_failed": True,
41+
"split_pdf_concurrency_level": 15
42+
}
43+
),
44+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
45+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
46+
uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
47+
).run()
48+
```

snippets/source_connectors/slack_api.py.mdx

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)