Skip to content
This repository was archived by the owner on Mar 26, 2020. It is now read-only.

Commit 60e5aa2

Browse files
author
Atin Mukherjee
authored
Merge branch 'master' into add-validations-to-brickmux-option
2 parents f977988 + 5f8ec37 commit 60e5aa2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+2562
-204
lines changed

doc/endpoints.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

doc/profiling.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
Tracking resource usage in Glusterd2
2+
====================================
3+
4+
Tracing and profiling with `pprof` is intended for developers only. This can be
5+
used for debugging memory leaks/consumption, slow flows through the code and so
6+
on. Users will normally not want profiling enabled on their production systems.
7+
To investigate memory allocations in Glusterd2, it is needed to enable the
8+
profiling feature. This can be done by adding `"profiling": true` in
9+
the `--config` file.
10+
Enabling profiling makes standard Golang pprof endpoints available. For memory
11+
allocations `/debug/pprof/heap` is most useful.
12+
Capturing a snapshot of the current allocations in the Glusterd2 is pretty
13+
simple. On the node running Glusterd2, the go pprof tool command can be used:
14+
```
15+
[root@fedora3 glusterd2]# go tool pprof http://localhost:24007/debug/pprof/heap
16+
File: glusterd2
17+
Build ID: 7a94c2e498445577aaf7f910d6ef1c3adc19d553
18+
Type: inuse_space
19+
Time: Nov 28, 2018 at 2:55pm (IST)
20+
Entering interactive mode (type "help" for commands, "o" for options)
21+
(pprof)
22+
```
23+
24+
For working with the size of the allocations, it is helpful to set all sizes to
25+
`megabytes`, otherwise `auto` is used as a `unit` and that would add human
26+
readable B, kB, MB postfixes. This however is not useful for sorting with
27+
scripts. So, set the `unit` to `megabytes` instead:
28+
29+
```
30+
(pprof) unit=megabytes
31+
```
32+
33+
```
34+
(pprof) top
35+
Showing nodes accounting for 330.75MB, 98.52% of 335.73MB total
36+
Dropped 305 nodes (cum <= 1.68MB)
37+
Showing top 10 nodes out of 14
38+
flat flat% sum% cum cum%
39+
326MB 97.10% 97.10% 326MB 97.10% github.com/gluster/glusterd2/pkg/sunrpc.ReadFullRecord /root/work/src/github.com/gluster/glusterd2/pkg/sunrpc/record.go
40+
2.38MB 0.71% 97.81% 2.38MB 0.71% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/rafthttp.startStreamWriter /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/rafthttp/stream.go
41+
2.38MB 0.71% 98.52% 4.77MB 1.42% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/rafthttp.startPeer /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/rafthttp/peer.go
42+
0 0% 98.52% 326.01MB 97.10% github.com/gluster/glusterd2/pkg/sunrpc.(*serverCodec).ReadRequestHeader /root/work/src/github.com/gluster/glusterd2/pkg/sunrpc/servercodec.go
43+
0 0% 98.52% 4.83MB 1.44% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).apply /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver/server.go
44+
0 0% 98.52% 4.83MB 1.44% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyAll /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver/server.go
45+
0 0% 98.52% 4.77MB 1.42% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyConfChange /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver/server.go
46+
0 0% 98.52% 4.83MB 1.44% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyEntries /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver/server.go
47+
0 0% 98.52% 4.83MB 1.44% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).run.func6 /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/etcdserver/server.go
48+
0 0% 98.52% 4.83MB 1.44% github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/pkg/schedule.(*fifo).run /root/work/src/github.com/gluster/glusterd2/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go
49+
(pprof)
50+
```
51+
52+
Looking at the above output, we can see that the first line consumes hundreds
53+
of MB while the other lines are just a fraction of it.
54+
This makes sure that ReadFullRecord is consuming more memory than required.
55+
To understand things even more clearly, we can see which line of ReadFullRecord
56+
consumes the memory. To view that use the list command:
57+
58+
```
59+
(pprof) list ReadFullRecord
60+
Total: 335.73MB
61+
ROUTINE ======================== github.com/gluster/glusterd2/pkg/sunrpc.ReadFullRecord in /root/work/src/github.com/gluster/glusterd2/pkg/sunrpc/clientcodec.go
62+
8.11kB 8.11kB (flat, cum) 0.0024% of Total
63+
. . 1:package sunrpc
64+
. . 2:
65+
. . 3:import (
66+
8.11kB 8.11kB 4: "bytes"
67+
. . 5: "io"
68+
. . 6: "net"
69+
. . 7: "net/rpc"
70+
. . 8: "sync"
71+
. . 9:
72+
ROUTINE ======================== github.com/gluster/glusterd2/pkg/sunrpc.ReadFullRecord in /root/work/src/github.com/gluster/glusterd2/pkg/sunrpc/record.go
73+
326MB 326MB (flat, cum) 97.10% of Total
74+
. . 96:func ReadFullRecord(conn io.Reader) ([]byte, error) {
75+
. . 97:
76+
. . 98: // In almost all cases, RPC message contain only one fragment which
77+
. . 99: // is not too big in size. But set a cap on buffer size to prevent
78+
. . 100: // rogue clients from filling up memory.
79+
326MB 326MB 101: record := bytes.NewBuffer(make([]byte, 0, maxRecordSize))
80+
. . 102: var fragmentHeader uint32
81+
. . 103: for {
82+
. . 104: // Read record fragment header
83+
. . 105: err := binary.Read(conn, binary.BigEndian, &fragmentHeader)
84+
. . 106: if err != nil {
85+
(pprof)
86+
```
87+
88+
The records allocation is the culprit and we need to look further into
89+
the code as to why the records is allocating so much and fix it.

0 commit comments

Comments
 (0)