Skip to content

Commit 2e8fd86

Browse files
authored
Ingest v2: Jira source connector (#512)
1 parent ab02b84 commit 2e8fd86

File tree

9 files changed

+198
-52
lines changed

9 files changed

+198
-52
lines changed

ingestion/source-connectors/jira.mdx

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,28 @@
22
title: Jira
33
---
44

5-
import SharedContentJira from '/snippets/sc-shared-text/jira.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentJira from '/snippets/sc-shared-text/jira-cli-api.mdx';
610

711
<SharedContentJira/>
812

13+
Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.
14+
15+
This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.
16+
917
import JiraSh from '/snippets/source_connectors/jira.sh.mdx';
10-
import JiraPy from '/snippets/source_connectors/jira.py.mdx';
18+
import JiraPyV2 from '/snippets/source_connectors/jira.v2.py.mdx';
19+
import JiraPyV1 from '/snippets/source_connectors/jira.v1.py.mdx';
1120

1221
<CodeGroup>
13-
1422
<JiraSh />
15-
16-
<JiraPy />
17-
23+
<JiraPyV2 />
24+
<JiraPyV1 />
1825
</CodeGroup>
1926

20-
For a full list of the options the Unstructured Ingest CLI accepts check `unstructured-ingest jira --help`.
27+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
28+
29+
<SharedPartitionByAPIOSS/>
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
The Jira connector dependencies:
2+
3+
```bash CLI, Python
4+
pip install "unstructured-ingest[jira]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
The following environment variables:
12+
13+
- `JIRA_URL` - The site URL for your Jira Data Center installation or Jira Cloud account, represented by `--url` (CLI) or `url` (Python).
14+
- One of the following:
15+
16+
- For Jira Cloud or Jira Data Center, the target user's name or email address, and password, as follows:
17+
18+
- `JIRA_USERNAME` - The name or email address of the target user, represented by `--username` (CLI) or `username` (Python).
19+
- `JIRA_PASSWORD_OR_API_TOKEN` - The user's password, represented by `--password` (CLI) or `password` (Python).
20+
21+
- For Jira Cloud only, the target user's name or email address, and API token, as follows:
22+
23+
- `JIRA_USERNAME` - The name or email address of the target user, represented by `--username` (CLI) or `username` (Python).
24+
- `JIRA_PASSWORD_OR_API_TOKEN` - The user's API token, represented by `--password` (CLI) or `password` (Python).
25+
26+
- For Jira Data Center only, the target user's personal access token (PAT), as follows:
27+
28+
- `JIRA_PERSONAL_ACCESS_TOKEN` - The user's personal access token (PAT), represented by `--token` (CLI) or `token` (Python).
29+
30+
Also:
31+
32+
- For Jira Cloud, you must specify `--cloud` (CLI) or set `cloud` to `True` (Python).
33+
- For Jira Data Center, you can specify `--no-cloud` (CLI) or set `cloud` to `False` (Python). This is the default if not otherwise specified.
34+
- To process specific projects, boards, or issues, use:
35+
36+
- `--projects` with a comma-delimited list of target project IDs (CLI) or `project` with an array of target project IDs (Python).
37+
- `--boards` with a comma-delmited list of target board IDs (CLI) or `boards` with an array of target board IDs (Python).
38+
- `--issues` with a comma-delimited list of target issue IDs (CLI) or `issues` with an array of target issue IDs (Python).
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
- A [Jira Cloud account](https://www.atlassian.com/try/cloud/signup?bundle=jira-software&edition=free) or
2+
[Jira Data Center installation](https://confluence.atlassian.com/adminjiraserver/installing-jira-data-center-938846870.html).
3+
- The site URL for your [Jira Data Center installation](https://confluence.atlassian.com/jirakb/find-your-site-url-to-set-up-the-jira-data-center-and-server-mobile-app-954244798.html) or Jira Cloud account.
4+
For Jira Cloud, open Jira in your web browser and copy the address from the browser's address bar.
5+
If you're unsure, check the dashboard URL, or if viewing an issue, project or board, the site URL is typically everything that comes before and including `/jira`, such as
6+
`https://<organization>.atlassian.net/jira`.
7+
- To process Jira projects, provide the IDs for the target projects. To get a project's ID, sign in to your Jira Cloud account or Jira Data Center installation, and then go to the following URL: `https://<organization>.atlassian.net/rest/api/latest/project/<project-key>`,
8+
replacing `<organization>` with yours, and replacing `<project-key>` with the target project's key. In the
9+
response, look for the URL `https://<organization>.atlassian.net/rest/api/3/project/<project-id>`, where `<project-id>` is the target project's ID.
10+
- To process Jira boards, the IDs for the target boards. To get a board's ID, sign in to your Jira Cloud account or Jira Data Center installation, and then go to the following URL: `https://<organization>.atlassian.net/rest/agile/1.0/board?projectKeyOrId=<project-key-or-id>`,
11+
replacing `<organization>` with yours, and `<project-key-or-id>` with the associated project's key or ID. In the
12+
response, look for the URL `https://<organization>.atlassian.net/rest/agile/1.0/board/<board-id>`, where `<board-id>` is the board's ID.
13+
- To process Jira issues, the IDs for the target issues. To get an issue's ID, sign in to your Jia Cloud account or Jira Data Center installation, open the issue, and then look at the URL in your browser's address bar. The issue ID is the string of characters after the final slash in the URL.
14+
- A user in your [Jira Cloud account](https://support.atlassian.com/jira-cloud-administration/docs/manage-users-groups-permissions-and-roles-in-jira-cloud/) or
15+
[Jira Data Center installation](https://confluence.atlassian.com/adminjiraserver/create-edit-or-remove-a-user-938847025.html).
16+
- The user must have the correct permissions in your
17+
[Jira Cloud account](https://support.atlassian.com/jira-cloud-administration/docs/manage-users-groups-permissions-and-roles-in-jira-cloud/) or
18+
[Jira Data Center installation](https://confluence.atlassian.com/jirakb/permissions-made-simple-for-jira-server-717062767.html) to
19+
access the target projects, boards, and issues.
20+
- One of the following:
21+
22+
- For Jira Cloud or Jira Data Center, the target user's name or email address, and password.
23+
[Change a Jira Cloud user's password](https://support.atlassian.com/user-management/docs/change-password-for-portal-only-customers/).
24+
[Change a Jira Data Center user's password](https://confluence.atlassian.com/adminjiraserver/create-edit-or-remove-a-user-938847025.html).
25+
- For Jira Cloud only, the target user's name or email address, and API token.
26+
[Create an API token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
27+
- For Jira Data Center only, the target user's personal access token (PAT).
28+
[Create a PAT](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Connect Jira to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.
2+
3+
The requirements are as follows.
4+
5+
import SharedJira from '/snippets/general-shared-text/jira.mdx';
6+
import SharedJiraCLIAPI from '/snippets/general-shared-text/jira-cli-api.mdx';
7+
8+
<SharedJira />
9+
<SharedJiraCLIAPI />

snippets/sc-shared-text/jira.mdx

Lines changed: 0 additions & 21 deletions
This file was deleted.
Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,22 @@
1-
```bash Shell
1+
```bash CLI
22
#!/usr/bin/env bash
33

4+
# Chunking and embedding are optional.
5+
46
unstructured-ingest \
57
jira \
6-
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
7-
--url https://unstructured-jira-connector-test.atlassian.net \
8-
--user-email [email protected] \
9-
--api-token ABCDE1234ABDE1234ABCDE1234 \
10-
--output-dir $LOCAL_FILE_OUTPUT_DIR \
11-
--num-processes 2 \
12-
--strategy hi_res
13-
```
8+
--url $JIRA_URL \
9+
--username $JIRA_USERNAME \
10+
--password $JIRA_PASSWORD_OR_API_TOKEN \ # Password or API token authentication.
11+
--token $JIRA_PERSONAL_ACCESS_TOKEN \ # Personal access token authentication only.
12+
--cloud \ # True for Jira Cloud.
13+
--no-cloud \ # For Jira Data Center (default).
14+
--output-dir $LOCAL_FILE_OUTPUT_DIR \
15+
--chunking-strategy by_title \
16+
--embedding-provider huggingface \
17+
--partition-by-api \
18+
--api-key $UNSTRUCTURED_API_KEY \
19+
--partition-endpoint $UNSTRUCTURED_API_URL \
20+
--strategy hi_res \
21+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
22+
```
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
```python Python Ingest v1
2+
import os
3+
4+
from unstructured_ingest.connector.jira import JiraAccessConfig, SimpleJiraConfig
5+
from unstructured_ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
6+
from unstructured_ingest.runner import JiraRunner
7+
8+
if __name__ == "__main__":
9+
runner = JiraRunner(
10+
processor_config=ProcessorConfig(
11+
verbose=True,
12+
output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"),
13+
num_processes=2,
14+
),
15+
read_config=ReadConfig(),
16+
partition_config=PartitionConfig(
17+
metadata_exclude=["filename", "file_directory", "metadata.data_source.date_processed"],
18+
partition_by_api=True,
19+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
20+
strategy="hi_res",
21+
),
22+
connector_config=SimpleJiraConfig(
23+
access_config=JiraAccessConfig(api_token=os.getenv("JIRA_PERSONAL_ACCESS_TOKEN")),
24+
url=os.getenv("JIRA_URL"),
25+
user_email=os.getenv("JIRA_USERNAME"),
26+
),
27+
)
28+
runner.run()
29+
```
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
```python Python Ingest v2
2+
import os
3+
4+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
5+
from unstructured_ingest.v2.interfaces import ProcessorConfig
6+
7+
from unstructured_ingest.v2.processes.connectors.jira import (
8+
JiraIndexerConfig,
9+
JiraDownloaderConfig,
10+
JiraConnectionConfig,
11+
JiraAccessConfig
12+
)
13+
14+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
15+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
16+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
17+
from unstructured_ingest.v2.processes.connectors.local import LocalUploaderConfig
18+
19+
if __name__ == "__main__":
20+
Pipeline.from_configs(
21+
context=ProcessorConfig(),
22+
indexer_config=JiraIndexerConfig(
23+
# projects=[
24+
# "project-id",
25+
# "project-id"
26+
# ],
27+
# boards=[
28+
# "board-id",
29+
# "board-id"
30+
# ],
31+
# issues=[
32+
# "issue-id",
33+
# "issue-id"
34+
# ]
35+
),
36+
downloader_config=JiraDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
37+
source_connection_config=JiraConnectionConfig(
38+
access_config=JiraAccessConfig(
39+
password=os.getenv("JIRA_PASSWORD_OR_API_TOKEN"), # Password or API token authentication.
40+
# token=os.getenv("JIRA_PERSONAL_ACCES_TOKEN") # Personal access token authentication only.
41+
),
42+
url=os.getenv("JIRA_URL"),
43+
username=os.getenv("JIRA_USERNAME"), # For password or API token authentication.
44+
cloud=True # True for Jira Cloud, False (default) for Jira Data Center.
45+
),
46+
partitioner_config=PartitionerConfig(
47+
partition_by_api=True,
48+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
49+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
50+
additional_partition_args={
51+
"split_pdf_page": True,
52+
"split_pdf_allow_failed": True,
53+
"split_pdf_concurrency_level": 15
54+
}
55+
),
56+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
57+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
58+
uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
59+
).run()
60+
```

snippets/source_connectors/jira_api.sh.mdx

Lines changed: 0 additions & 15 deletions
This file was deleted.

0 commit comments

Comments
 (0)