Skip to content

Commit 5bf154e

Browse files
josmartinJos Martinalanzli
authored
Merge bz17645 (#7)
* Initial changeset to include BZ17645 * update bz17645 patch for rhel 2.28 * update rhel update-specfile.sh script add bz17654 * add BZ-17645.md * fixing BZ-17645.md * Updates to merge in RHEL building of BZ17645 * Resume automatic build on Almalinux for BZ17645 Co-authored-by: Jos Martin <[email protected]> Co-authored-by: alanli <[email protected]> Co-authored-by: alanzli <[email protected]>
1 parent aa957fd commit 5bf154e

24 files changed

+2611
-91
lines changed

.github/workflows/release-all-rhel.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ name: "release-all-rhel"
66
# ensure all artifacts are up-to-date with security and other patches to these distributions
77
on:
88
workflow_dispatch:
9-
# No need to schedule thie build - we know it no longer works on post glibc-2.28-189.1.el8 releases
10-
# schedule:
11-
# - cron: '0 2 1 * *'
9+
# Resume build on almalinux post glibc-2.28-189.1.el8 releases as we are still fixing new issues
10+
schedule:
11+
- cron: '0 2 1 * *'
1212

1313
jobs:
1414
build-alma-8-4:

BZ-17645.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# BZ-17645 Patch
2+
## Summary
3+
This patch resolves a performance issue that affects MATLAB and Simulink shutdown performance. The patch provides a new sorting algorithm for shared objects in the dynamic loader. The original algorithm in glibc versions prior to glibc 2.35 is slow when the DSO set contains circular dependencies.
4+
5+
## Bug Description
6+
The performance issue impacts the MATLAB and Simulink shutdown time. In a Debian 11 environment using glibc 2.31, the MATLAB and Simulink shutdown time is about 300 seconds with modern hardware. With the same setup and the patch enabled, the shutdown time is less than 3 seconds. The performance issue was first reported in November of 2014 by Paulo Andrade. For more information, see [RFE: Improve performance of dynamic loader for deeply nested DSO dependencies](https://sourceware.org/bugzilla/show_bug.cgi?id=17645).
7+
8+
## Patch Sources
9+
This patch contains a new implementation of _dl_sort_maps, which Paulo Andrade introduced in [RFE: Improve performance of dynamic loader for deeply nested DSO dependencies](https://sourceware.org/bugzilla/show_bug.cgi?id=17645). Chung-Lin Tang <[email protected]> and Adhemerval Zanella <[email protected]> later incorporated the new implementation into the master branch of glibc 2.35 in the commit [elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=15a0c5730d1d5aeb95f50c9ec7470640084feae8).
10+
11+
The MathWorks BZ-17645 patch sets the new DFS sorting algorithm as the default behavior.
12+
13+
## Acknowledgements and Thanks
14+
Many thanks to the glibc team and, particularly, Paulo Andrade for reporting the issue and providing the original implementation and Chung-Lin Tang and Adhemerval Zanella for incorporating the new sorting algorithm into glibc 2.35 and providing the original patch.

BZ-19329.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# BZ-19329 Patch
2+
## Summary
3+
This repository provides a method for working around the sporadic issue seen on older linux distributions: MathWorks&reg; products can trigger an [assert failure at concurrent pthread_create and dlopen (BZ-19329)](https://sourceware.org/bugzilla/show_bug.cgi?id=19329) in the [GNU C Libraries (glibc)](https://www.gnu.org/software/libc/).
4+
5+
If you are running
6+
* **ubuntu-based** systems and can upgrade to **version 22.04 (Jammy Jellyfish)** this is the safest and easiest way to alleviate the issue, since that version contains glibc v2.35 in which the underlying issue is completely fixed.
7+
* **RHEL-based 8.4 or 8.5** systems (*update 27 June 2022*). It appears that RHEL have patched the `glibc-2.28` packages in release `189` to fix this issue. Ensure that you have installed at least [`glibc-2.28-189.1.el8`](https://git.almalinux.org/rpms/glibc/commit/385bc0f199bf51199143fe12b857f4983db76e48).
8+
9+
If instead you want to work around this issue, you can use this repository. It provides a build procedure (in an isolated Docker&reg; container) to produce patched versions of the glibc libraries for recent Almalinux, Ubuntu&reg; and Debian&reg; releases. These patched versions [incorporate an initial fix](https://patchwork.ozlabs.org/project/glibc/patch/[email protected]/) proposed on the [libc-alpha mailing list](https://sourceware.org/mailman/listinfo/libc-alpha) that mitigate the issue. In the release area of this repository you can find the debian package build artefacts produced by running the build on Ubuntu 18.04 & 20.04 as well as Debian 9, 10 & 11. You can install these artefacts on an appropriate debian-based machine, virtual machine or docker container, using `dpkg -i`. For Almalinux you cand find the appropriate `rpm's` which should also work on UBI and CentOS containers.
10+
11+
## Bug Description
12+
The [assert failure at concurrent pthread_create and dlopen](https://sourceware.org/bugzilla/show_bug.cgi?id=19329) glibc bug was first reported in December 2015 and can affect any process on Linux that creates a thread at the same time as opening a dynamic shared object library. Initially the issue was only observable with reasonable frequency on very large scale systems such as high performance computing clusters or cloud scale deployment platforms and so did not receive significant attention. However, early on there were [proposed patches](https://sourceware.org/bugzilla/show_bug.cgi?id=19329) to the library. Large scale systems applied those patches in-house and saw significant benefit. More recently a [proposed complete fix for this](https://sourceware.org/pipermail/libc-alpha/2021-February/122626.html) and a set of related issues has been reviewed by the glibc team and accepted into version 2.34 of glibc (released in August 2021). The 2.34 version of glibc is available in [RHEL 9 beta](https://developers.redhat.com/articles/2021/11/03/red-hat-enterprise-linux-9-beta-here) and [Ubuntu 21.10 (Impish Indri)](https://launchpad.net/ubuntu/+source/glibc). However, there are no plans to backport the fix into previous glibc versions and it is expected that previous versions will be in production use for a significant number of years (e.g. the current end-of-life date for Ubuntu:20.04 is April 2030).
13+
14+
More recently MathWorks products have made extensive use of a C++ micro-services architecture. This architecture leads to a more dynamic system in which library modules are loaded at the point of use. As a result, the MATLAB&reg; process is more likely to load a library at the same time as creating a thread, and so is more likely to encounter this glibc bug. When this [issue is encountered](https://www.mathworks.com/matlabcentral/answers/1454674-why-does-matlab-crash-on-linux-with-inconsistency-detected-by-ld-so-elf-dl-tls-c-597-_dl_allo) the console that opened MATLAB shows a message similar to the following:
15+
16+
```
17+
Inconsistency detected by ld.so: ../elf/dl-tls.c: 597: _dl_allocate_tls_init: Assertion 'listp != NULL' failed!
18+
```
19+
or
20+
```
21+
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
22+
```
23+
There might also be a stack trace file called `matlab_crash_dump.${PID}` in the users home folder or the current working folder. This usually indicates that a segmentation violation has been detected and the stack trace starts with something similar to the following:
24+
25+
```
26+
Stack Trace (from fault):
27+
[ 0] 0x00002b661142d5a0 /lib64/ld-linux-x86-64.so.2+00075168 _dl_allocate_tls_init+00000080
28+
[ 1] 0x00002b66120c187c /usr/lib64/libpthread.so.0+00034940 pthread_create+00001884
29+
```
30+
31+
If you see these or similar signatures at a sufficient frequency on a system, you might want to consider patching glibc on that system, machine or container.
32+
33+
### RHEL 8.4 & 8.5 Update (*27 June 2022*)
34+
35+
RHEL have just integrated the BZ-19329 patch into [`glibc-2.28-189.1.el8`](https://git.almalinux.org/rpms/glibc/commit/385bc0f199bf51199143fe12b857f4983db76e48). It appear that the change actually went into build [`2.28-175`](https://git.almalinux.org/rpms/glibc/src/commit/385bc0f199bf51199143fe12b857f4983db76e48/SPECS/glibc.spec#L2721) and got released with `2.28-189`.
36+
37+
Unless you need to use a `pre-189` release of the package you should no longer need to use this repository to patch RHEL and AlmaLinux for BZ-18329
38+
39+
## Patch sources
40+
These patches all derive from an [original patch](https://sourceware.org/legacy-ml/libc-alpha/2016-01/msg00480.html) put together by Szabolcs Nagy in January 2016. The 2.24 to 2.28 patches in this repo are derived from this original e-mail and can be downloaded directly from the archive of the `[email protected]` mailing list where they were proposed:
41+
42+
* https://sourceware.org/legacy-ml/libc-alpha/2016-11/msg01092.html
43+
* https://sourceware.org/legacy-ml/libc-alpha/2016-11/msg01093.html
44+
45+
These 2 patches are directly linked in [the original bug report](https://sourceware.org/bugzilla/show_bug.cgi?id=19329) in comment 7 by Pádraig Brady. In addition, the bug report also has a reference to the original Szabolcs Nagy patch in comment 4 (dated January 2016). The 2 messages above refer back to that original patch via a [message describing the overall problem in more detail](https://sourceware.org/legacy-ml/libc-alpha/2016-11/msg01026.html).
46+
47+
In addition, in [Sept 2017 Pádraig Brady](https://sourceware.org/bugzilla/show_bug.cgi?id=19329) pointed out that there was an off-by-one error in the original patch that needs to be included
48+
``` diff
49+
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
50+
index 073321c..2c9ad2a 100644
51+
--- a/elf/dl-tls.c
52+
+++ b/elf/dl-tls.c
53+
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
54+
}
55+
56+
total += cnt;
57+
- if (total >= dtv_slots)
58+
+ if (total > dtv_slots)
59+
break;
60+
61+
/* Synchronize with dl_add_to_slotinfo. */
62+
```
63+
This is source for the final `unsubmitted-bz19329-fixup.v2.27.patch`
64+
65+
In glibc v2.31 the original source code changed significantly and the patches needed to be slightly adapted so as to match the new codebase. These adapted patches are included here in the `patches/2.31` folder and soft-linked from 2.32 and 2.33.
66+
67+
## Acknowledgement and thanks
68+
Many thanks to the broader glibc team and particularly Szabolcs Nagy for providing the original patches and for fixing these issues in glibc v2.34.

Dockerfile.rhel

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
# Copyright 2021 The MathWorks, Inc.
1+
# Copyright 2021 - 2022 The MathWorks, Inc.
22
ARG BUILD_ROOT=/root/
33
ARG RPM_DIR=${BUILD_ROOT}/rpmbuild/RPMS/x86_64/
44

5-
# Default to building for glibc 2.31 in ubuntu:20.04 but by specifying
6-
# --build-arg RELEASE=18.04 in the docker build phase this will build for
7-
# glibc 2.27
5+
86
ARG ARCH=
97
ARG DIST_BASE=almalinux
108
ARG DIST_TAG=8.5
@@ -21,10 +19,11 @@ RUN dnf download --source glibc && \
2119
dnf builddep -y --nodocs glibc-*.src.rpm && \
2220
rpm -ivh glibc-*.src.rpm
2321

22+
ARG GLIBC_VERSION=2.28-189
2423
COPY scripts/update-specfile.sh ${BUILD_ROOT}
25-
COPY patches/rhel ${BUILD_ROOT}/patches
24+
COPY patches/rhel/${GLIBC_VERSION} ${BUILD_ROOT}/patches
2625

27-
RUN cp patches/2.28/* rpmbuild/SOURCES && \
26+
RUN cp patches/* rpmbuild/SOURCES && \
2827
./update-specfile.sh
2928

3029
RUN rpmbuild --nocheck -bb rpmbuild/SPECS/glibc.spec

0 commit comments

Comments
 (0)