You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: user/enterprise/troubleshooting-guide.md
+20-47Lines changed: 20 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,30 +9,21 @@ This document provides guidelines and suggestions for troubleshooting your Travi
9
9
10
10
## Travis CI Enterprise 3.x
11
11
12
-
The TCIE 3.x is deployed as Kubernetes cluster. Thus `travis bash` and `travis console` working previously with single Docker installation will not work anymore. They're replaced with specific `kubectl` command line commands.
12
+
The TCIE 3.x is deployed as a Kubernetes cluster. Thus,`travis bash` and `travis console` working previously with a single Docker installation will not work anymore. They're replaced with specific `kubectl` command line commands.
13
13
14
14
The term `Worker machine` still means the worker instance(s) that process and run the builds.
15
15
16
-
## Travis CI Enterprise 2.x
17
-
18
-
Throughout this document we'll be using the following terms to refer to the two components of your Travis CI Enterprise 2.x installation:
19
-
20
-
-`Platform machine`: The instance that runs the main Travis components, including the web frontend.
21
-
-`Worker machine`: The worker instance(s) that process and run the builds.
22
-
23
-
> Please note that this guide is geared towards non-High Availability (HA) setups. Please contact us at [[email protected]](mailto:[email protected]) if you require support for your HA setup.
24
-
25
16
## Builds are not Starting
26
17
27
-
In the Travis CI Web UI you see no builds are starting. The builds either have no visible state or have a state of `queued`. Canceling and restarting builds has no effect.
18
+
In the Travis CI Web UI, you see no builds are starting. The builds either have no visible state or have a state of `queued`. Canceling and restarting builds has no effect.
28
19
29
20
### Possible Issues and Workarounds
30
21
31
-
There are a few different potential approaches that may help to get builds running again. Please try each one in order.
22
+
There are a few different approaches that may help get builds running again. Please try each one in order.
32
23
33
24
#### Lost connection to RabbitMQ
34
25
35
-
The Enterprise Platform uses RabbitMQ to communicate with worker machine(s) in order to process builds. In certain circumstances it is possible for the worker machine(s) to lose connection with RabbitMQ and therefore become unable to process builds successfully. This is a known problem and we're working on it to deliver a permanent solution.
26
+
The Enterprise Platform uses RabbitMQ to communicate with worker machine(s) in order to process builds. In certain circumstances, it is possible for the worker machine(s) to lose connection with RabbitMQ and therefore become unable to process builds successfully. This is a known problem and we're working on it to deliver a permanent solution.
36
27
37
28
In the meantime, to return everything back to a normal working state, you can restart the worker machine(s) manually. This can be done by connecting to the worker(s) via `ssh` and running the following command:
38
29
@@ -71,7 +62,7 @@ $ sudo restart travis-worker
71
62
72
63
A source for the problem could be that the worker machine is not able to communicate with the platform machine.
73
64
74
-
Here we're distinguishing between an AWS EC2 installation and an installation running on other hardware. For the former, security groups need to be configured per machine. To do so, please follow our installation instructions [here](/user/enterprise/setting-up-travis-ci-enterprise/#1-setting-up-enterprise-platform-virtual-machine). If you're not using AWS EC2, please make sure that the ports listed [in the docs](/user/enterprise/setting-up-travis-ci-enterprise/#1-setting-up-enterprise-platform-virtual-machine) are open in your firewall.
65
+
Here, we're distinguishing between an AWS EC2 installation and an installation running on other hardware. For the former, security groups need to be configured per machine. To do so, please follow our installation instructions [here](/user/enterprise/setting-up-travis-ci-enterprise/#1-setting-up-enterprise-platform-virtual-machine). If you're not using AWS EC2, please make sure that the ports listed [in the docs](/user/enterprise/setting-up-travis-ci-enterprise/#1-setting-up-enterprise-platform-virtual-machine) are open in your firewall.
75
66
76
67
#### SSL Verification Issues
77
68
@@ -97,35 +88,25 @@ This issue sometimes occurs after maintenance on workers that were originally in
97
88
98
89
#### Issue when running Enterprise v2.2 or higher
99
90
100
-
By default, the Enterprise Platform v2.2 or higher will attempt to route builds to the `builds.trusty` queue. This could lead to build issues, if you are not running a Trusty worker to process those builds or if you are targeting a different distribution (e.g. `xenial`).
91
+
By default, the Enterprise Platform v2.2 or higher will attempt to route builds to the `builds.trusty` queue. This could lead to build issues if you are not running a Trusty worker to process those builds or if you are targeting a different distribution (e.g.,`xenial`).
101
92
102
93
To address this, either:
103
94
104
95
- Ensure that you have installed a Trusty worker on a new virtual machine instance: [Trusty installation guide](/user/enterprise/trusty/)
105
-
- Override the default queuing behavior to specify a new queue. To override the default queue you must access the Admin Dashboard at `https://<your-travis-ci-enterprise-domain>:8800/settings#override_default_dist_enable` and toggle the 'Override Default Build Environment' button. This will allow you to specify the new default based on your needs and the workers that you have available.
96
+
- Override the default queuing behavior to specify a new queue. To override the default queue, you must access the Admin Dashboard at `https://<your-travis-ci-enterprise-domain>:8800/settings#override_default_dist_enable` and toggle the 'Override Default Build Environment' button. This will allow you to specify the new default based on your needs and the workers that you have available.
106
97
107
98
#### Issue when running Enterprise v3.x or higher
108
99
109
-
Verify if default queue configured in Enterprise Platform 3.x routes builds to a matching, existing workers. You may choose to alter the default queue setting by running admin console UI on `http://loclahost:8800`, navigating to Configuration and altering the option 'Set Default Build Environment' by selecting one of available options.
110
-
111
-
112
-
## Enterprise Container Fails to Start due to 'context deadline exceeded' Error
113
-
114
-
> This issue occurs only for TCIE 2.x. The TCIE 3.x is deployed as Kubernetes/microk8s cluster.
100
+
Verify if the default queue configured in Enterprise Platform 3.x routes builds to a matching, existing workers. You may choose to alter the default queue setting by running the admin console UI on `http://loclahost:8800`, navigating to Configuration, and altering the option 'Set Default Build Environment' by selecting one of the available options.
115
101
116
-
After a fresh installation or configuration change the Enterprise container doesn't start and the following error is visible in the Admin Dashboard found at `https://<your-travis-ci-enterprise-domain>:8800/dashboard`:
117
-
118
-
```
119
-
Ready state command canceled: context deadline exceeded
120
-
```
121
102
122
103
### Possible Issues and Workarounds
123
104
124
-
The following is a possible issue with the GitHub OAuth app configuration and its workaround.
105
+
The following is a possible issue with the GitHub OAuth app configuration and workaround.
125
106
126
107
#### GitHub OAuth app configuration
127
108
128
-
The abovementioned error can be caused by a configuration mismatch in [the GitHub OAuth Application](/user/enterprise/setting-up-travis-ci-enterprise/#prerequisites). Please check that _both_ website and callback URL contain the Travis CI Enterprise's hostname. If you have discovered a mismatch here, please restart the Travis container from within the Admin Dashboard.
109
+
The above-mentioned error can be caused by a configuration mismatch in [the GitHub OAuth Application](/user/enterprise/setting-up-travis-ci-enterprise/#prerequisites). Please check that _both_ website and callback URL contain the Travis CI Enterprise's hostname. If you have discovered a mismatch here, please restart the Travis container from within the Admin Dashboard.
129
110
130
111
131
112
## The travis-worker does not start on Ubuntu 16.04
@@ -140,7 +121,7 @@ In addition, the command `sudo journalctl -u travis-worker` contains the followi
140
121
141
122
### Workaround
142
123
143
-
One possible reason that travis-worker is not running is that `systemctl` cannot create a temporary directory for environment files. To fix this, please create the directory `/var/tmp/travis-run.d/travis-worker` and assign write permissions via:
124
+
One possible reason travis-worker is not running is that `systemctl` cannot create a temporary directory for environment files. To fix this, please create the directory `/var/tmp/travis-run.d/travis-worker` and assign write permissions via:
144
125
145
126
```sh
146
127
$ mkdir -p /var/tmp/travis-run.d/
@@ -159,11 +140,11 @@ This can have various causes, including an automatic nvm update or a caching err
159
140
160
141
### Workaround
161
142
162
-
This error is most likely caused by a self-signed certificate. During the build, the worker container attempts to fetch different files from the platform machine. If the server was originally provisioned with a self-signed certificate, curl doesn't trust this certificate and therefore fails. While we're working on resolving this in a permanent way, currently the only solution is to install a certificate issued by a trusted Certificate Authority (CA). This can be a free Let's Encrypt certificate or any other trusted CA of your choice. We have a section in our [SSL Certificate Management](/user/enterprise/ssl-certificate-management/#using-a-lets-encrypt-ssl-certificate) page that walks you through the installation process using Let's Encrypt as an example.
143
+
This error is most likely caused by a self-signed certificate. During the build, the worker container attempts to fetch different files from the platform machine. If the server was originally provisioned with a self-signed certificate, curl doesn't trust this certificate and, therefore, fails. While we're working on resolving this in a permanent way, currently the only solution is to install a certificate issued by a trusted Certificate Authority (CA). This can be a free Let's Encrypt certificate or any other trusted CA you choose. We have a section in our [SSL Certificate Management](/user/enterprise/ssl-certificate-management/#using-a-lets-encrypt-ssl-certificate) page that walks you through the installation process using Let's Encrypt as an example.
163
144
164
145
## User Accounts Stuck in Syncing State
165
146
166
-
One or more user accounts are stuck in the `is_syncing = true` state. When you query the database, the number of users which are currently syncing does not decrease over the time. Example:
147
+
One or more user accounts are stuck in the `is_syncing = true` state. When you query the database, the number of currently syncing users does not decrease over time. Example:
167
148
168
149
```sql
169
150
travis_production=>selectcount(*) from users where is_syncing=true;
@@ -179,32 +160,30 @@ You can reset the `is_syncing` flag for user accounts that are stuck by running:
179
160
180
161
**TCIE 3.x**: Run `$ kubectl exec -it [travis-api-pod]j /app/script/console`*on your local machine*
181
162
182
-
**TCIE 2.x**: Log into the platform machine via SSH. Run `$ travis console`
183
-
184
-
Next, regardless of TCIE version, run:
163
+
Next, run:
185
164
186
165
```bash
187
166
>> User.where(is_syncing: true).count
188
167
>> ActiveRecord::Base.connection.execute('set statement_timeout to 60000')
189
168
>> User.update_all(is_syncing: false)
190
169
```
191
170
192
-
It can happen that organizations are also stuck in the syncing state. Since an organization itself does not have a`is_syncing` flag, all users that do belong to it have to be successfully synced.
171
+
It can happen that organizations are also stuck in the syncing state. Since an organization does not have an`is_syncing` flag, all users that belong to it must be successfully synced.
193
172
194
173
## Logs contain GitHub API 422 errors
195
174
196
-
On every commit made when a build runs, a commit status is created for a given SHA. Due to GitHub’s limitations at 1,000 statuses per SHA and context within a repository, if more than 1,000 statuses are created this leads to a validation error.
175
+
On every commit made when a build runs, a commit status is created for a given SHA. Due to GitHub’s limitations at 1,000 statuses per SHA and context within a repository, if more than 1,000 statuses are created, this leads to a validation error.
197
176
This issue should no longer be present in GitHub Apps integrations but will be present in Webhooks integrations.
198
177
199
178
### Possible Issues and Workarounds
200
179
201
180
The workaround for this issue is to manually re-sync the user account with GitHub. This will generate a fresh token for the user account that has not reached any GitHub API limits.
202
181
203
-
There are two options listed below to initiate a sync between your Travis CI Enterprise instance and GitHub instance.
182
+
Two options are listed below to initiate a sync between your Travis CI Enterprise instance and GitHub instance.
204
183
205
184
#### Sync account from Travis CI web interface
206
185
207
-
Ask the owner of **the affected account** (usually printed in the logs) to sync it with your GitHub instance. To do so they should:
186
+
Ask the owner of **the affected account** (usually printed in the logs) to sync it with your GitHub instance. To do so, they should:
208
187
209
188
1. Open `https://<your-travis-ci-enterprise-domain>`.
210
189
2. In the upper right corner of the page, hover over the user icon and select 'Profile' from the dropdown menu.
@@ -218,18 +197,12 @@ An administrator can also initiate a sync on behalf of someone else:
218
197
219
198
`kubectl exec -it [travis-github-sync-pod] bundle exec bin/schedule users [login if single user] `
220
199
221
-
**TCIE 2.x**: via the `travis` CLI tool on the platform machine:
222
-
223
-
> If `—logins=<GITHUB-LOGIN>` is not provided then this command will trigger a sync on every user. This could result in long runtimes and may impact production operations if you have a large number of total users on your Travis CI Enterprise instance.
224
-
225
-
1. Open an SSH connection to the platform machine.
226
-
2. Initiate a sync by running `travis sync_users —logins=<GITHUB-LOGIN>`
227
200
228
201
## RabbiMQ AMQPS issue causes build jobs not to enqueue
229
202
230
-
> This issue occurs only in TCIE 3.x. The TCIE 2.x Rabbit does not contain any AMQPS support.
203
+
> This issue occurs only in TCIE 3.x.
231
204
232
-
When using self-signed certificate, the Rabbit MQ AMQPS will not work which will result in jobs queueing forever. Worker logs will indicate security issues when connecting to Rabbit using AMQPS.
205
+
When using a self-signed certificate, the Rabbit MQ AMQPS will not work, resulting in jobs queueing forever. Worker logs will indicate security issues when connecting to Rabbit using AMQPS.
0 commit comments