|
| 1 | +.. _dev-testing-windows: |
| 2 | + |
| 3 | +================= |
| 4 | +Testing - Windows |
| 5 | +================= |
| 6 | + |
| 7 | +Since Pacific, the Ceph client tools and libraries can be natively used on |
| 8 | +Windows. This allows Windows nodes to consume Ceph without additional layers |
| 9 | +such as iSCSI gateways or SMB shares. |
| 10 | + |
| 11 | +A significant amount of unit tests and integration tests were ported in order |
| 12 | +to ensure that these components continue to function properly on Windows. |
| 13 | + |
| 14 | +Windows CI Job |
| 15 | +============== |
| 16 | + |
| 17 | +The `Windows CI job`_ performs the following steps for each GitHub pull request: |
| 18 | + |
| 19 | +* spin up a Linux VM in which to build the server-side (Linux) Ceph binaries |
| 20 | + and cross-compile the Windows (client) binaries. |
| 21 | +* recreate the Linux VM and start a Ceph vstart cluster |
| 22 | +* boot a Windows VM and run the Ceph tests there |
| 23 | + |
| 24 | +`A small PowerShell framework`_ parallelizes the tests, aggregates the results |
| 25 | +and isolates or skips certain tests that are known to be flaky. |
| 26 | + |
| 27 | +The console output can contain compilation errors as well as the name of the |
| 28 | +tests that failed. To get the console output of the failing tests as well as |
| 29 | +Ceph and operating system logs, please check the build artifacts from the |
| 30 | +Jenkins "Status" page. |
| 31 | + |
| 32 | +.. image:: ../../images/windows_ci_status_page.png |
| 33 | + :align: center |
| 34 | + |
| 35 | +The Windows CI artifacts can be downloaded as a zip archive or viewed inside |
| 36 | +the browser. Click the "artifacts" button to see the contents of the artifacts |
| 37 | +folder. |
| 38 | + |
| 39 | +.. image:: ../../images/windows_ci_artifacts.png |
| 40 | + :align: center |
| 41 | + |
| 42 | +Artifact contents: |
| 43 | + |
| 44 | +* ``client/`` - Ceph client-side logs (Windows) |
| 45 | + * ``eventlog/`` - Windows system logs |
| 46 | + * ``logs/`` - Ceph logs |
| 47 | + * ``-windows.conf`` - Ceph configuration file |
| 48 | +* ``cluster/`` - Ceph server-side logs (Linux) |
| 49 | + * ``ceph_logs/`` |
| 50 | + * ``journal`` |
| 51 | +* ``test_results/`` |
| 52 | + * ``out/`` - raw and xml test output grouped by the test executable |
| 53 | + * ``test_results.html`` - aggregated test report (html) |
| 54 | + * ``test_results.txt`` - aggregated test report (plaintext) |
| 55 | + |
| 56 | +We're using the `subunit`_ format and associated tools to aggregate the test |
| 57 | +results, which is especially handy when running a large amount of tests in |
| 58 | +parallel. |
| 59 | + |
| 60 | +The aggregated test report provides a great overview of the failing tests. |
| 61 | +Go to the end of the file to see the actual errors:: |
| 62 | + |
| 63 | + {0} unittest_mempool.mempool.bufferlist_reassign [0.000000s] ... ok |
| 64 | + {0} unittest_mempool.mempool.bufferlist_c_str [0.006000s] ... ok |
| 65 | + {0} unittest_mempool.mempool.btree_map_test [0.000000s] ... ok |
| 66 | + {0} ceph_test_dokan.DokanTests.test_mount [9.203000s] ... FAILED |
| 67 | + |
| 68 | + Captured details: |
| 69 | + ~~~~~~~~~~~~~~~~~ |
| 70 | + b'/home/ubuntu/ceph/src/test/dokan/dokan.cc:136' |
| 71 | + b'Expected equality of these values:' |
| 72 | + b' wait_for_mount(mountpoint)' |
| 73 | + b' Which is: -138' |
| 74 | + b' 0' |
| 75 | + b'' |
| 76 | + b'/home/ubuntu/ceph/src/test/dokan/dokan.cc:208' |
| 77 | + b'Expected equality of these values:' |
| 78 | + b' ret' |
| 79 | + b' Which is: "ceph-dokan: exit status: -22"' |
| 80 | + b' ""' |
| 81 | + b'Failed unmapping: Y:\\' |
| 82 | + {0} ceph_test_dokan.DokanTests.test_mount_read_only [9.140000s] ... FAILED |
| 83 | + |
| 84 | +The html report conveniently groups the test results by test suite (test binary). |
| 85 | +For security reasons it isn't rendered by default but it can be downloaded and |
| 86 | +viewed locally: |
| 87 | + |
| 88 | +.. image:: ../../images/windows_ci_html_report.png |
| 89 | + :align: center |
| 90 | + |
| 91 | +Timeouts and missing test results are often an indication that a process crashed. |
| 92 | +Note that the ceph status is printed out on the console before and after |
| 93 | +performing the tests, which can help identify crashed services. |
| 94 | + |
| 95 | +You may also want to check the service logs (both client and server side). Also, |
| 96 | +be aware that the Windows "application" event log will contain entries in case |
| 97 | +of crashed Windows processes. |
| 98 | + |
| 99 | +Frequently asked questions |
| 100 | +========================== |
| 101 | + |
| 102 | +1. Why is the Windows CI job the only one that fails on my PR? |
| 103 | + |
| 104 | +Ceph integration tests are normally performed through Teuthology on the Ceph |
| 105 | +Lab infrastructure. These tests are triggered on-demand by the Ceph QA |
| 106 | +team and do not run automatically for every submitted pull request. |
| 107 | + |
| 108 | +Since the Windows CI job focuses only on the client-side Ceph components, |
| 109 | +it can run various integration tests in a timely manner for every pull request |
| 110 | +on GitHub. **In other words, it runs various librados, librbd and libcephfs |
| 111 | +tests that other checks such as "make check" do not.** |
| 112 | + |
| 113 | +For this reason, the Windows CI often catches regressions that are missed by the |
| 114 | +other checks and would otherwise only come up through Teuthology. More often |
| 115 | +than not, these regressions are not platform-specific and affect Linux as well. |
| 116 | + |
| 117 | +In case of Windows CI failures, we strongly suggest checking the test results |
| 118 | +as described above. |
| 119 | + |
| 120 | +Be aware that the `Windows build script`_ may use different compilation flags |
| 121 | +and ``-D`` options passed to CMake. For example, it defaults to ``Release`` mode |
| 122 | +instead of ``Debug`` mode. At the same time, it uses a different toolchain |
| 123 | +(``mingw-llvm``) and a separate set of `dependencies`_, make sure to bump the |
| 124 | +versions if needed. |
| 125 | + |
| 126 | +2. Why is the Windows CI job mandatory? |
| 127 | + |
| 128 | +The test job was initially optional, as a result regressions were introduced |
| 129 | +very often. |
| 130 | + |
| 131 | +After a time, Windows support became mature enough to make this CI job mandatory. |
| 132 | +This significantly reduces the amount of work required to address regressions |
| 133 | +and assures Ceph users of continued Windows support. |
| 134 | + |
| 135 | +As said before, another great advantage is that it runs integration tests that |
| 136 | +quickly catch regressions which often affect Linux builds as well. This spares |
| 137 | +developers from having to wait for the full Teuthology results. |
| 138 | + |
| 139 | +.. _Windows CI job: https://github.com/ceph/ceph-build/blob/main/ceph-windows-pull-requests/config/definitions/ceph-windows-pull-requests.yml |
| 140 | +.. _A small PowerShell framework: https://github.com/ceph/ceph-win32-tests/ |
| 141 | +.. _Windows build script: https://github.com/ceph/ceph/blob/main/win32_build.sh |
| 142 | +.. _dependencies: https://github.com/ceph/ceph/blob/main/win32_deps_build.sh |
| 143 | +.. _subunit: https://github.com/testing-cabal/subunit |
0 commit comments