|
| 1 | += ASF Jenkins Setup |
| 2 | +:toc: left |
| 3 | + |
| 4 | +The Solr project uses a Jenkins instance provided by the Apache Software Foundation ("ASF") for running tests, validation, etc. |
| 5 | + |
| 6 | +This file aims to document our [ASF Jenkins](https://ci-builds.apache.org/job/Solr/) usage and administration, to prevent it from becoming "tribal knowledge" understood by just a few. |
| 7 | + |
| 8 | +== Jobs |
| 9 | + |
| 10 | +We run a number of jobs on Jenkins, each validating an overlapping set of concerns: |
| 11 | + |
| 12 | +* `Solr-Artifacts-*` - daily jobs that run `./gradlew assemble` to ensure that build artifacts (except docker images) can be created successfully |
| 13 | +* `Solr-check-*` - "hourly" jobs that run all project tests and static analysis (i.e. `test`, `integrationTest`, and `check`) |
| 14 | +* `Solr-Docker-Nightly-*` - daily jobs that `./gradlew testDocker dockerPush` to validate docker image packaging. Snapshot images are pushed to hub.docker.com |
| 15 | +* `Solr-reference-guide-*` - hourly jobs that build the Solr reference guide via `./gradlew checkSite` and push the resulting artifact to the staging/preview site `nightlies.apache.org` |
| 16 | +* `Solr-Smoketest-*` - daily jobs that produce a snapshot release (via the `assembleRelease` task) and run the release smoketester |
| 17 | + |
| 18 | +Most jobs that validate particular build artifacts are run "daily", which is sufficient to prevent any large breaks from creeping into the build. |
| 19 | + |
| 20 | +On the other hand, jobs that run tests are triggered "hourly" in order to squeeze as many test runs as possible out of our Jenkins hardware. |
| 21 | +This is a necessary consequence of Solr's heavy use of randomization in its test-suite. |
| 22 | +"Hourly" scheduling ensures that a test run is either currently running or in the build queue at all times, and enables us to get the maximum data points from our hardware. |
| 23 | + |
| 24 | +== Jenkins Agents |
| 25 | + |
| 26 | +All Solr jobs run on Jenkins agents marked with the 'solr' label. |
| 27 | +Currently, this maps to two Jenkins agents: |
| 28 | + |
| 29 | +* `lucene-solr-1` - available at lucene1-us-west.apache.org |
| 30 | +* `lucene-solr-2` - available (confusingly) at lucene-us-west.apache.org |
| 31 | + |
| 32 | +These agents are "project-specific" VMs shared by the Lucene and Solr projects. |
| 33 | +That is: they are VMs requested by a project for their exclusive use. |
| 34 | +(INFRA policy appears to be that each Apache project may request 1 dedicated VM; it's unclear how Solr ended up with 2.) |
| 35 | + |
| 36 | +Maintenance of these agent VMs falls into a bit of a gray area. |
| 37 | +INFRA will still intervene when asked: to reboot nodes, to deploy OS upgrades, etc. |
| 38 | +But some burden also falls on Lucene and Solr as project teams to monitor the the VMs and keep them healthy. |
| 39 | + |
| 40 | +=== Accessing Jenkins Agents |
| 41 | + |
| 42 | +With a few steps, Solr committers can access our project's Jenkins agent VMs via SSH to troubleshoot and resolve issues. |
| 43 | + |
| 44 | +1. Ensure your account on id.apache.org has an SSH key associated with it. |
| 45 | +2. Ask INFRA to give your Apache ID SSH access to these boxes. (See [this JIRA ticket](https://issues.apache.org/jira/browse/INFRA-3682) for an example.) |
| 46 | +3. SSH into the desired box with: `ssh <apache-id>@$HOSTNAME` (where `$HOSTNAME` is either `lucene1-us-west.apache.org` or `lucene-us-west.apache.org`) |
| 47 | + |
| 48 | +Often, SSH access on the boxes is not sufficient, and administrators require "root" access to diagnose and solve problems. |
| 49 | +Sudo/su priveleges can be accessed via a one-time pad ("OTP") challenge, managed by the "Orthrus PAM" module. |
| 50 | +Users in need of root access can perform the following steps: |
| 51 | + |
| 52 | +1. Open the ASF's [OTP Generator Tool](https://selfserve.apache.org/otp-calculator.html) in your browser of choice |
| 53 | +2. Run `ortpasswd` on the machine. This will print out a OTP "challenge" (e.g. `otp-md5 497 lu6126`) and provide a password prompt. This password prompt should be given a OTP password, generated in steps 3-5 below. |
| 54 | +3. Copy the "challenge" from the previous step into the relevant field on the "OTP Generator Tool" form. |
| 55 | +4. Choose a password to use for OTP Challenges (or recall one you've used in the past), and type this into the relevant field on the "OTP Generator Tool" form. |
| 56 | +5. Click "Compute", and copy the first line from the "Response" box into your SSH session's password prompt. You're now established in the "Orthrus PAM" system. |
| 57 | +6. Run a command requesting `su` escalation (e.g. `sudo su -`). This should print another "challenge" and password prompt. Repeat steps 3-5. |
| 58 | + |
| 59 | +If this fails at any point, open a ticket with INFRA. |
| 60 | +You may need to be added to the 'sudoers' file for the VM(s) in question. |
| 61 | + |
| 62 | +=== Known Jenkins Issues |
| 63 | + |
| 64 | +One recurring problem with the Jenkins agents is that they periodically run out of disk-space. |
| 65 | +Usually this happens when enough "workspaces" are orphaned or left behind, consuming all of the agent's disk space. |
| 66 | + |
| 67 | +Solr Jenkins jobs are currently configured to clean up the previous workspace at the *start* of the subsequent run. |
| 68 | +This avoids orphans in the common case but leaves workspaces behind any time a job is renamed or deleted (as happens during the Solr release process). |
| 69 | + |
| 70 | +Luckily, this has an easy fix: SSH into the agent VM and delete any workspaces no longer needed in `/home/jenkins/jenkins-slave/workspace/Solr`. |
| 71 | +Any workspace that doesn't correspond to a [currently existing job](https://ci-builds.apache.org/job/Solr/) can be safely deleted. |
| 72 | +(It may also be worth comparing the Lucene workspaces in `/home/jenkins/jenkins-slave/workspace/Lucene` to [that project's list of jobs](https://ci-builds.apache.org/job/Lucene/).) |
0 commit comments