Skip to content

Commit bf4efb8

Browse files
committed
docs: Add documentation for environment variables to control github graphql job collector
1 parent 7c31eb2 commit bf4efb8

File tree

2 files changed

+28
-6
lines changed

2 files changed

+28
-6
lines changed

docs/GettingStarted/Environment.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,19 @@ description: How to set up environment variables for DevLake
77
This document explains how to set environment variables for Apache DevLake and what environment variables can be set.
88

99
## Environment Variables
10+
1011
### ENABLE_SUBTASKS_BY_DEFAULT
12+
1113
This environment variable is used to enable or disable the execution of subtasks.
1214

1315
#### How to set
16+
1417
The format is as follows: plugin_name1:subtask_name1:enabled_value,plugin_name2:subtask_name2:enabled_value,plugin_name3:subtask_name3:enabled_value
15-
18+
1619
Guidance on locating the [plugin_name and subtask_name](https://github.com/apache/incubator-devlake/blob/release-v1.0/backend/plugins/jira/tasks/issue_changelog_collector.go#L41):
1720

1821
- plugin_name: Represents the plugin's name, such as 'jira' for the Jira plugin.
19-
- subtask_name: Denotes the subtask's name, like 'collectIssueChangelogs' for the Jira plugin."
22+
- subtask_name: Denotes the subtask's name, like 'collectIssueChangelogs' for the Jira plugin."
2023

2124
Example 1: Enable some subtasks that are closed by default
2225

@@ -25,18 +28,36 @@ ENABLE_SUBTASKS_BY_DEFAULT="jira:collectIssueChangelogs:true,jira:extractIssueCh
2528
```
2629

2730
Example 2: Close some subtasks that are executed by default
31+
2832
```shell
2933
ENABLE_SUBTASKS_BY_DEFAULT="github_graphql:Collect Job Runs:false,github_graphql:Extract Job Runs:false,github_graphql:Convert Job Runs:false"
3034
```
3135

32-
#### How to take effect
33-
After setting the environment variable, restart the DevLake service to take effect.
34-
- For Docker Compose, run `docker-compose down` and `docker-compose up -d`.
35-
- For Helm, run `helm upgrade devlake devlake/devlake --recreate-pods`.
36+
### GITHUB_GRAPHQL_JOB\_...
37+
38+
This set of environment variables is used to configure and finetune the behavior of the GitHub GraphQL Job Runs collection process.
39+
40+
| Environment Variable | Description | Default Value |
41+
| --------------------------------------- | ------------------------------------------------------------------------------------- | ------------- |
42+
| GITHUB_GRAPHQL_JOB_COLLECTION_MODE | Specifies the mode of job collection. Possible values are `BATCHING` and `PAGINATING` | `BATCHING` |
43+
| GITHUB_GRAPHQL_JOB_BATCHING_INPUT_STEP | Defines the step size for batching mode. | `10` |
44+
| GITHUB_GRAPHQL_JOB_BATCHING_PAGE_SIZE | Defines the limit of jobs to collect in a batch for each run. | `20` |
45+
| GITHUB_GRAPHQL_JOB_PAGINATING_PAGE_SIZE | Defines the page size for paginating mode. | `50` |
3646

47+
#### When to Use
3748

49+
These environment variables are particularly useful when dealing with large repositories that have a significant number of job runs. By adjusting these settings, you can optimize the data collection process to better suit your specific needs and infrastructure capabilities. Also this can help to avoid timeouts on the github GraphQL API with too large requests.
3850

51+
- Use `BATCHING` for `GITHUB_GRAPHQL_JOB_COLLECTION_MODE` when your workflow runs typically have less than 20 jobs and you want to minimize the number of API calls to GitHub.
52+
- Adjust `GITHUB_GRAPHQL_JOB_BATCHING_INPUT_STEP` and `GITHUB_GRAPHQL_JOB_BATCHING_PAGE_SIZE` to control how many jobs are collected in each batch. **NOTE:** Increasing these values can lead to timeouts if the requests become too large.
53+
- Use `PAGINATING` for `GITHUB_GRAPHQL_JOB_COLLECTION_MODE` when your workflow runs have a large number of jobs (e.g., more than 50). This mode will only query 1 Workflow run at a time and paginate through the jobs, reducing the risk of timeouts.
54+
- Adjust `GITHUB_GRAPHQL_JOB_PAGINATING_PAGE_SIZE` to control how many jobs are fetched per page. A smaller page size can help avoid timeouts but may increase the total number of API calls.
3955

56+
TLDR: `BATCHING` is more efficient for smaller workflows, while `PAGINATING` will guarantee complete collection of jobs for larger workflows.
4057

58+
## How to take effect
4159

60+
After setting the environment variable, restart the DevLake service to take effect.
4261

62+
- For Docker Compose, run `docker-compose down` and `docker-compose up -d`.
63+
- For Helm, run `helm upgrade devlake devlake/devlake --recreate-pods`.

docs/Plugins/github.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Metrics that can be calculated based on the data collected from GitHub:
6262

6363
- Configuring GitHub via [Config UI](/Configuration/GitHub.md)
6464
- Configuring GitHub via Config UI's [advanced mode](/Configuration/AdvancedMode.md#1-github).
65+
- Configurable via [Environment Variables](/GettingStarted/Environment.md#github_graphql_job_...).
6566

6667
## API Sample Request
6768

0 commit comments

Comments
 (0)