Skip to content

Commit 1129560

Browse files
committed
* Update to MQ 9.3.4
* Fix Prometheus label cardinality when reporting disconnect (#245) * Add hints about monitoring "large" queue managers in TUNING.md * Make sure missing YAML configuration attributes have reasonable defaults * Update all vendored dependencies
1 parent 3706990 commit 1129560

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+11478
-832
lines changed

.dockerignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,7 @@
1-
.git/
21
.github/
2+
.git/
3+
4+
# We use a file in the .git/refs directory to try to extract current commit level
5+
# so exclude that directory from the exclusions
6+
!.git/refs
7+

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Changelog
22
Newest updates are at the top of this file.
33

4+
### Oct 19 2023 (v5.5.1)
5+
* Update to MQ 9.3.4
6+
* Fix Prometheus label cardinality when reporting disconnect (#245)
7+
* Add hints about monitoring "large" queue managers in TUNING.md
8+
* Make sure missing YAML configuration attributes have reasonable defaults
9+
* Update all vendored dependencies
10+
411
### Jun 20 2023 (v5.5.0)
512
* Update to MQ 9.3.3
613
* Update Dockerfile to support platforms without Redist client (#209)

Dockerfile

Lines changed: 44 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# material from the build step into the runtime container.
66
#
77
# It can cope with both platforms where a Redistributable Client is available, and platforms
8-
# where it is not - copy the .deb install images for such platforms into the MQDEB
8+
# where it is not - copy the .deb install images for such platforms into the MQINST
99
# subdirectory of this repository first.
1010

1111
# Global ARG. To be used in all stages.
@@ -21,7 +21,7 @@ ARG EXPORTER
2121
ENV EXPORTER=${EXPORTER} \
2222
ORG="github.com/ibm-messaging" \
2323
REPO="mq-metric-samples" \
24-
VRMF=9.3.3.0 \
24+
VRMF=9.3.4.0 \
2525
CGO_CFLAGS="-I/opt/mqm/inc/" \
2626
CGO_LDFLAGS_ALLOW="-Wl,-rpath.*" \
2727
genmqpkg_incnls=1 \
@@ -39,26 +39,31 @@ RUN mkdir -p /go/src /go/bin /go/pkg \
3939
&& chmod -R 777 /go \
4040
&& mkdir -p /go/src/$ORG \
4141
&& mkdir -p /opt/mqm \
42-
&& mkdir -p /MQDEB \
42+
&& mkdir -p /MQINST \
4343
&& chmod a+rx /opt/mqm
4444

4545
# Install MQ client and SDK
4646
# For platforms with a Redistributable client, we can use curl to pull it in and unpack it.
47-
# For other platforms, we assume that you have the deb files available under the current directory
47+
# For most other platforms, we assume that you have deb files available under the current directory
4848
# and we then copy them into the container image. Use dpkg to install from them; these have to be
49-
# done in the right order.
49+
# done in the right order.
50+
#
51+
# The Linux ARM64 image is a full-function server package that is directly unpacked.
52+
# We only need a subset of the files so strip the unneeded filesets. The download of the image could
53+
# be automated via curl in the same way as the Linux/amd64 download, but it's a much bigger image and
54+
# has a different license. So I'm not going to do that for now.
5055
#
5156
# If additional Redistributable Client platforms appear, then this block can be altered, including the MQARCH setting.
5257
#
5358
# The copy of the README is so that at least one file always gets copied, even if you don't have the deb files locally.
54-
# Using a wildcard in the directory name also helps to ensure that this part of the build should always succeed.
55-
COPY README.md MQDEB*/*deb /MQDEB
56-
57-
# This is a value always set by the "docker build" process
58-
ARG TARGETPLATFORM
59-
RUN echo "Target arch is $TARGETPLATFORM"
60-
# Might need to refer to TARGETPLATFORM a few times in this block, so define something shorter.
61-
RUN T="$TARGETPLATFORM"; \
59+
# Using a wildcard in the directory name also helps to ensure that this part of the build always succeeds.
60+
COPY README.md MQINST*/*deb MQINST*/*tar.gz /MQINST
61+
62+
# These are values always set by the "docker build" process
63+
ARG TARGETARCH TARGETOS
64+
RUN echo "Target arch is $TARGETARCH; os is $TARGETOS"
65+
# Might need to refer to TARGET* vars a few times in this block, so define something shorter.
66+
RUN T="$TARGETOS/$TARGETARCH"; \
6267
if [ "$T" = "linux/amd64" ]; \
6368
then \
6469
MQARCH=X64;\
@@ -69,35 +74,48 @@ RUN T="$TARGETPLATFORM"; \
6974
&& tar -zxf ./*.tar.gz \
7075
&& rm -f ./*.tar.gz \
7176
&& bin/genmqpkg.sh -b /opt/mqm;\
77+
elif [ "$T" = "linux/arm64" ] ;\
78+
then \
79+
cd /MQINST; \
80+
c=`ls *$VRMF*.tar.gz 2>/dev/null| wc -l`; if [ $c -ne 1 ]; then echo "MQ installation file does not exist in MQINST subdirectory";exit 1;fi; \
81+
cd /opt/mqm \
82+
&& tar -zxf /MQINST/*.tar.gz \
83+
&& export genmqpkg_incserver=0 \
84+
&& bin/genmqpkg.sh -b /opt/mqm;\
7285
elif [ "$T" = "linux/ppc64le" -o "$T" = "linux/s390x" ];\
7386
then \
74-
cd /MQDEB; \
75-
c=`ls ibmmq-*$VRMF*.deb| wc -l`; if [ $c -lt 4 ]; then echo "MQ installation files do not exist in MQDEB subdirectory";exit 1;fi; \
87+
cd /MQINST; \
88+
c=`ls ibmmq-*$VRMF*.deb 2>/dev/null| wc -l`; if [ $c -lt 4 ]; then echo "MQ installation files do not exist in MQINST subdirectory";exit 1;fi; \
7689
for f in ibmmq-runtime_$VRMF*.deb ibmmq-gskit_$VRMF*.deb ibmmq-client_$VRMF*.deb ibmmq-sdk_$VRMF*.deb; do dpkg -i $f;done; \
7790
else \
7891
echo "Unsupported platform $T";\
7992
exit 1;\
8093
fi
8194

82-
# Build Go application
95+
# Build the Go application
8396
WORKDIR /go/src/$ORG/$REPO
8497
COPY go.mod .
8598
COPY go.sum .
8699
COPY --chmod=777 ./cmd/${EXPORTER} .
87100
COPY --chmod=777 vendor ./vendor
88101
COPY --chmod=777 pkg ./pkg
89-
RUN go build -mod=vendor -o /go/bin/${EXPORTER} ./*.go
102+
# This file holds something like the current commit level if it exists in your tree. It might not be there, so
103+
# we use wildcards to avoid errors on non-existent files/dirs.
104+
COPY --chmod=777 ./.git*/refs/heads/master* .
105+
RUN buildStamp=`date +%Y%m%d-%H%M%S`; \
106+
hw=`uname -m`; \
107+
os=`uname -s`; \
108+
bp="$os/$hw"; \
109+
if [ -r master ]; then gitCommit=`cat master`;else gitCommit="Unknown";fi; \
110+
BUILD_EXTRA_INJECT="-X \"main.BuildStamp=$buildStamp\" -X \"main.BuildPlatform=$bp\" -X \"main.GitCommit=$gitCommit\""; \
111+
go build -mod=vendor -ldflags "$BUILD_EXTRA_INJECT" -o /go/bin/${EXPORTER} ./*.go
90112

91113
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- #
92114
### ### ### ### ### ### ### RUN ### ### ### ### ### ### ###
93115
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- #
94116
FROM golang:1.19 AS runtime
95117

96118
ARG EXPORTER
97-
ENV EXPORTER=${EXPORTER} \
98-
LD_LIBRARY_PATH="/opt/mqm/lib64:/usr/lib64" \
99-
MQ_CONNECT_TYPE=CLIENT \
100-
IBMMQ_GLOBAL_CONFIGURATIONFILE=/opt/config/${EXPORTER}.yaml
101119

102120
# Create directory structure
103121
RUN mkdir -p /opt/bin \
@@ -120,6 +138,11 @@ RUN mkdir -p /IBM/MQ/data/errors \
120138
&& chmod -R 777 /IBM \
121139
&& chmod -R 777 /.mqm
122140

141+
ENV EXPORTER=${EXPORTER} \
142+
LD_LIBRARY_PATH="/opt/mqm/lib64:/usr/lib64" \
143+
MQ_CONNECT_TYPE=CLIENT \
144+
IBMMQ_GLOBAL_CONFIGURATIONFILE=/opt/config/${EXPORTER}.yaml
145+
123146
COPY --chmod=555 --from=builder /go/bin/${EXPORTER} /opt/bin/${EXPORTER}
124147
COPY --from=builder /opt/mqm/ /opt/mqm/
125148

Dockerfile.build

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ARG BASE_IMAGE=ubuntu:20.04
1616
FROM $BASE_IMAGE
1717

1818
ARG GOPATH_ARG="/go"
19-
ARG GOVERSION=1.17
19+
ARG GOVERSION=1.19
2020
ARG GOARCH=amd64
2121
ARG MQARCH=X64
2222

@@ -61,7 +61,7 @@ RUN mkdir -p $GOPATH/src $GOPATH/bin $GOPATH/pkg \
6161
# Location of the downloadable MQ client package \
6262
ENV RDURL="https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/messaging/mqdev/redist" \
6363
RDTAR="IBM-MQC-Redist-Linux${MQARCH}.tar.gz" \
64-
VRMF=9.3.3.0
64+
VRMF=9.3.4.0
6565

6666
# Install the MQ client from the Redistributable package. This also contains the
6767
# header files we need to compile against. Setup the subset of the package
@@ -77,24 +77,19 @@ RUN cd /opt/mqm \
7777
&& bin/genmqpkg.sh -b /opt/mqm
7878

7979
# Insert the script that will do the build
80-
COPY scripts/buildInDocker.sh $GOPATH
81-
RUN chmod 777 $GOPATH/buildInDocker.sh
80+
COPY --chmod=777 scripts/buildInDocker.sh $GOPATH
8281

8382
WORKDIR $GOPATH/src/$ORG/$REPO
84-
COPY go.mod .
85-
COPY go.sum .
86-
RUN chmod 777 go.*
83+
COPY --chmod=777 go.mod .
84+
COPY --chmod=777 go.sum .
85+
COPY --chmod=777 config.common.yaml .
8786

88-
COPY config.common.yaml .
89-
RUN chmod 777 config.common.yaml
90-
91-
#RUN /usr/lib/go-${GOVERSION}/bin/go mod download
87+
# RUN /usr/lib/go-${GOVERSION}/bin/go mod download
9288

9389
# Copy the rest of the source tree from this directory into the container and
9490
# make sure it's readable by the user running the container
9591
ENV REPO="mq-metric-samples"
96-
COPY . $GOPATH/src/$ORG/$REPO
97-
RUN chmod -R a+rwx $GOPATH/src/$ORG/$REPO
92+
COPY --chmod=777 . $GOPATH/src/$ORG/$REPO
9893

9994
# Set the entrypoint to the script that will do the compilation
10095
ENTRYPOINT $GOPATH/buildInDocker.sh

Dockerfile.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ RUN apt-get update \
3434
# Location of the downloadable MQ client package \
3535
ENV RDURL="https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/messaging/mqdev/redist" \
3636
RDTAR="IBM-MQC-Redist-Linux${MQARCH}.tar.gz" \
37-
VRMF=9.3.3.0
37+
VRMF=9.3.4.0
3838

3939
# Install the MQ client from the Redistributable package. This also contains the
4040
# header files we need to compile against. Setup the subset of the package

MQDEB/README

Lines changed: 0 additions & 5 deletions
This file was deleted.

MQINST/README

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
If you are using the Dockerfile in the root of this repository
2+
to build and run an exporter/collector program:
3+
4+
For Linux platforms without a Redistributable Client package, but
5+
with full install packages, copy the .deb installation files into
6+
this directory.
7+
8+
For Linux/Arm64, copy the .tar.gz installation file into this
9+
directory.

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ file if you wish to reload all of the dependencies by running `go mod vendor`.
3131

3232
You will require the following programs:
3333

34-
* Go compiler - version 1.17 is the minimum defined here
34+
* Go compiler - version 1.19 is the minimum defined here
3535
* C compiler
3636

3737

@@ -88,9 +88,9 @@ containers. You still need to provide the configuration file at runtime, perhaps
8888
```
8989

9090
### Platform support
91-
This Dockerfile should work for a variety of platforms. For those with a Redistributable client, it uses
92-
`curl` to automatically download and unpack the required MQ files. For other platforms, it assumes that
93-
you have an `MQDEB` subdirectory under this root, and then copied the `.deb` files from your
91+
This Dockerfile should work for a variety of platforms. For those with a Redistributable client, it uses `curl` to
92+
automatically download and unpack the required MQ files. For other platforms, it assumes that you have an `MQINST`
93+
subdirectory under this root, and then copied the `.deb` files (or the `.tar.gz` file for Linux/arm64 systems) from your
9494
real MQ installation tree into it.
9595

9696
### Additional container scripts

TUNING.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Tuning hints for monitoring "large" queue managers
2+
3+
If you have a large queue manager - perhaps several thousands of queues - then a lot of data could be produced for
4+
monitoring those queues. Some default configuration options might need tuning to get acceptable performance. Reducing
5+
the frequency of generation and/or collection may be appropriate. There may be several places where tuning might be
6+
done: in this collector, in the database configuration, and in the queue manager.
7+
8+
The following sections describe different pieces that you might want to look at.
9+
10+
The document is mostly written from the viewpoint of using Prometheus as the database. That is mainly because
11+
Prometheus has the unique "pull" model, where the server calls the collector at configured intervals. Other databases and
12+
collector technologies supported from this repository have a simpler way of "pushing" data to the various backends.
13+
However much of the document is relevant regardless of where the metrics end up.
14+
15+
## Collector location
16+
It is most efficient to run the collector program as a local bindings application, connecting directly to the queue
17+
manager. That removes all the MQ Client flows that would have to be done for every message.
18+
19+
If you cannot avoid running as a client (for example, you are trying to monitor the MQ Appliance or z/OS), then keep the
20+
network latency between the queue manager and collector as low as possible. For z/OS, you might consider running the
21+
collector in a zLinux LPAR on the same machine. Or perhaps in a zCX container.
22+
23+
Also configure the client to take advantage of readahead when getting publications. This is done by setting
24+
`DEFREADA(YES)` on the nominated ReplyQueue(s).
25+
26+
## Collection processing time
27+
The collector reports on how long it takes to collect and process the data on each interval. You can see this in a debug
28+
log. The Prometheus collector also has a `ibmmq_qmgr_exporter_collection_time` metric. Note that this time is the value
29+
as seen by the main collection thread; the real total time as seen by Prometheus is usually longer. There is likely
30+
still work going on in the background to send metrics to the database, and for it to be successfully ingested.
31+
32+
The first time that the collection time exceeds the Prometheus default `scrape_timeout` value, a warning message is
33+
emitted. This can be ignored if you are expecting a scrape to take a longer period. But it can be helpful if you didn't
34+
know that you might need to do some tuning.
35+
36+
The true total time taken for a scrape can be seen in Prometheus directly. For example, you can use the admininistrative
37+
interface at `http://<server>:9090/targets?search=` and find the target corresponding to your queue manager.
38+
39+
For other collectors, there is no specific metric. But the timestamps on each collection block allow you to deduce the
40+
time taken as the difference between successive iterations is the collection period plus the `interval` configuration
41+
value.
42+
43+
## Ensuring collection intervals have enough time to run
44+
The Prometheus `scrape_configs` configuration attributes can be configured for all or some collectors. In particular,
45+
you will probably want to change the `scrape_interval` and `scrape_timeout` values for the jobs associated with large
46+
queue managers. Use the reported collection processing time as a basis from which to set these values.
47+
48+
For other collector models, the collector-specific `interval` attribute determines the gap between each push of the
49+
metrics. There is no "maximum" collection time.
50+
51+
## Reducing metric publication interval from queue manager
52+
By default, the queue manager publishes resource metrics every 10 seconds. This matches fairly well with the Prometheus
53+
default scrape interval of 15s. But if you increase the scrape interval, you might also want to reduce the frequency of
54+
publications so that fewer "merges" have to be done when processing the subscription destination queues. Setting the
55+
following stanza in the _qm.ini_ file changes that frequency:
56+
```
57+
TuningParameters:
58+
MonitorPublishHeartBeat = 30
59+
```
60+
This value is given in seconds. And the attribute is case-sensitive. As increasing the value reduces the frequency of
61+
generation, it may cause you to miss shorter-lived transient spikes in some values. That's the tradeoff you have to
62+
evaluate. But having a value smaller than the time taken to process the publications might result in a never-ending
63+
scrape. The publication-processing portion of the scrape can be seen in a debug log.
64+
65+
## Reducing subscriptions made to queue manager
66+
Reducing the total number of subscriptions made will reduce the data that needs to be processed. But at the cost of
67+
missing some metrics that you might find useful. See also the section in the [README](README.md) file about using
68+
durable subscriptions.
69+
70+
* You can disable all use of published resource metrics, and rely on the `DISPLAY xxSTATUS` responses. This clearly
71+
reduces the data, but you lose out on many useful metrics. It is essentially how we monitor z/OS queue managers as
72+
they do not have the publication model for metrics. But if you want this approach, set the `global.usePublications`
73+
configuration option to `false`
74+
75+
* You can reduce the total number of subscriptions made for queue metrics. The `filters.queueSubscriptionSelector` list
76+
defines the sets of topics that you might be interested in. The complete set - for now - is
77+
[OPENCLOSE, INQSET, PUT, GET, GENERAL]. In many cases, only the last three of these may be of interest. The smaller
78+
set reduces the number of publications per queue. Within each set, multiple metrics are created but there is no way to
79+
report on only a subset of the metrics in each set.
80+
81+
* You can choose to not subscribe to any queue metrics, but still subscribe to metrics for other resources such as the
82+
queue manager and Native HA by setting the filter to `NONE`. If you do this, then many queue metrics become
83+
unavailable. However, the current queue depth will still be available as it can also be determined from the
84+
`DISPLAY QSTATUS` response.
85+
86+
## Reducing the number of monitored objects and status requests
87+
Each object type (queues, channels etc) has a block in the collector configuration that names which objects should be
88+
monitored. While both positive and negative wildcards can be used in these blocks, it is probably most efficient to use
89+
only positive wildcards. That allows the `DISPLAY xxSTATUS` requests to pass the wildcards directly into the queue
90+
manager commands; if there are any negative patterns, the collector has to work out which objects match the pattern, and
91+
then inquire for the remainder individually.
92+
93+
## Other configuration options
94+
The `global.pollInterval` and `global.rediscoverInterval` options may help to further reduce inquiries.
95+
96+
The first of these controls how frequently the `DISPLAY xxSTATUS` commands are used, assuming the
97+
`global.useObjectStatus` is `true`. In some circumstances, you might not want all of the responses as regularly as the
98+
published metrics are handled.
99+
100+
The second attribute controls how frequently the collector reassesses the list of objects to be monitored, and their
101+
more stable attributes. For example, the `DESCRIPTION` or `MAXDEPTH` settings on a queue. If you have a large number of
102+
queues that do not change frequently, then you might want to increase the rediscovery attribute. The default is 1 hour.
103+
The tradeoff here is that newly-defined queues may not have any metrics reported until this interval expires.
104+
105+
## Dividing the workload
106+
One further approach that you might like to consider, though I wouldn't usually recommend it, is to have two or more
107+
collectors running against the same queue manager. And then configure different sets of queues to be monitored. So a
108+
collector listening on port 9157 might manage queues A*-M*, while another collector on port 9158 monitors queues N*-Z*.
109+
You would likely need additional configuration to reduce duplication of other components, for example by using the
110+
`jobname` or `instance` as a filter element on dashboard queries, but it might be one way to reduce the time taken for a
111+
single scrape.
112+
113+
## Very slow queue managers
114+
The collectors wait for a short time for each response to a status request. If the timeout expires with no expected
115+
message appearing, then an error is reported. Some queue managers - particuarly when hosted in cloud services - have
116+
appeared to "stall" for a period. Even though they are not especially busy, the response messages have not appeared in
117+
time. The default wait of 3 seconds can be tuned using the `connection.waitInterval` option.
118+
119+
For all collectors _except_ Prometheus, a small number of these timeout errors are permitted consecutively. The failure
120+
count is reset after a successful collection. See _pkg/errors/errors.go_ for details. The Prometheus collector has an
121+
automatic reconnect option after failures, so does not currently use this strategy.

cmd/mq_json/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ func printInfo(title string, stamp string, commit string, buildPlatform string)
4747
if buildPlatform != "" {
4848
log.Infoln("Build Platform: " + buildPlatform)
4949
}
50-
log.Infoln("MQ Go Version : " + cf.MqGolangVersion)
50+
log.Infoln("MQ Go Version : " + cf.MqGolangVersion())
5151
log.Println("")
5252
}
5353

0 commit comments

Comments
 (0)