Skip to content

Commit 2f9ca1a

Browse files
Merge pull request #1861 from jasonrandrews/review
Review MongoDB Learning Path
2 parents 6c27700 + ff9efbc commit 2f9ca1a

File tree

6 files changed

+61
-59
lines changed

6 files changed

+61
-59
lines changed

content/learning-paths/servers-and-cloud-computing/mongodb/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ learning_objectives:
1111
- Measure and compare the performance of MongoDB on Arm versus other architectures with Yahoo Cloud Serving Benchmark (YCSB).
1212

1313
prerequisites:
14-
- An Arm based instance from a cloud service provider.
14+
- An [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider.
1515

1616
armips:
1717
- Neoverse

content/learning-paths/servers-and-cloud-computing/mongodb/automate_setup_pulumi.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Install the python dependencies on your Ubuntu 22.04 machine:
2020
sudo apt update
2121
sudo apt install python-is-python3 -y
2222
sudo apt install python3-pip -y
23-
sudo apt install python3.10-venv
23+
sudo apt install python3.10-venv -y
2424
```
2525

2626
## Install Pulumi
@@ -41,7 +41,7 @@ git clone https://github.com/pbk8s/pulumi-ec2.git
4141
```
4242

4343
## Build gatord
44-
You would also need the gatord binary for performance analysis. [gator](https://github.com/ARM-software/gator) is a target agent (daemon), part of Arm Streamline, a set of performance analysis tools. Use the following commands to build it from source.
44+
You will also need the gatord binary for performance analysis. [gator](https://github.com/ARM-software/gator) is a target agent (daemon), part of Arm Streamline, a set of performance analysis tools. Use the following commands to build it from source.
4545

4646
```bash
4747
git clone https://github.com/ARM-software/gator.git
@@ -65,14 +65,14 @@ cp build-native-gcc-rel/gatord ~/pulumi-ec2/
6565
## Install awscli and set environment variables
6666
Use the [awscli](https://learn.arm.com/install-guides/aws-cli/) learning path to install the awscli.
6767

68-
Set the following environment variables on your local computer to connect to your AWS account
69-
```console
68+
Set the following environment variables on your local computer to connect to your AWS account:
69+
```bash
7070
export AWS_ACCESS_KEY_ID=<access-key-id>
7171
export AWS_SECRET_ACCESS_KEY=<secret-access-key>
7272
export AWS_SESSION_TOKEN=<session-token>
7373
```
74-
Execute the following command to validate the credentials
75-
```console
74+
Execute the following command to validate the credentials:
75+
```bash
7676
aws sts get-caller-identity
7777
```
7878

@@ -134,7 +134,7 @@ subnet = aws.ec2.Subnet("p1-subnet",
134134
})
135135
```
136136

137-
Note: The security groups created by this script are lot less restrictive, to simplify the deployment process and to remove additional complexities. Please modify the ingress/egress rules as per your organizations' policy.
137+
Note: The security groups created by this script are a lot less restrictive, to simplify the deployment process and to remove additional complexities. Please modify the ingress/egress rules as per your organization's policy.
138138

139139
```python
140140
group = aws.ec2.SecurityGroup('p1-security-grouup',

content/learning-paths/servers-and-cloud-computing/mongodb/benchmark_mongodb-8.0.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ weight: 4 # (intro is 1), 2 is first, 3 is second, etc.
88
layout: "learningpathall"
99
---
1010

11-
To further measure the performance of MongoDB, you can run the [Yahoo Cloud Serving Benchmark](http://github.com/brianfrankcooper/YCSB).
11+
To further measure the performance of MongoDB, you can run the [Yahoo Cloud Serving Benchmark](https://github.com/brianfrankcooper/YCSB).
1212

1313
YCSB is an open source project which provides the framework and common set of workloads to evaluate the performance of different "key-value" and "cloud" serving stores. Use the steps below to run YCSB to evaluate the performance of MongoDB running on 64-bit Arm machine.
1414

@@ -22,9 +22,10 @@ Install the additional software:
2222
{{< tab header="Ubuntu" >}}
2323
sudo apt install -y maven make gcc
2424
{{< /tab >}}
25-
{{< tab header="RHE/Amazon" >}}
25+
{{< tab header="RHEL / Amazon Linux" >}}
2626
sudo yum check-update
27-
sudo yum install python2
27+
# Python 2 may not be available via yum on recent RHEL/Amazon Linux versions.
28+
# If needed, follow the manual installation steps below.
2829
{{< /tab >}}
2930
{{< /tabpane >}}
3031

@@ -38,7 +39,7 @@ wget https://www.python.org/ftp/python/2.7.18/Python-2.7.18.tgz
3839
tar xvf Python-2.7.18.tgz
3940
cd Python-2.7.18
4041
./configure --enable-optimizations
41-
make -j $nproc
42+
make -j $(nproc)
4243
sudo make altinstall
4344
sudo ln -s /usr/local/bin/python2.7 /usr/local/bin/python
4445
```
@@ -68,14 +69,14 @@ To load and test the performance of loading data(INSERT) into default database `
6869
```console
6970
./bin/ycsb load mongodb -s -P workloads/workloada -p mongodb.url=mongodb://localhost:27017/ycsb?w=0 -threads 10
7071
```
71-
The "-P" parameter is used to load property files. In this example, you used it load the workloada parameter file which sets the recordcount to 1000 in addition to other parameters. The "-threads" parameter indicates the number of threads and is set to 1 by default.
72+
The "-P" parameter is used to load property files. In this example, you used it load the workloada parameter file which sets the recordcount to 1000 in addition to other parameters. The "-threads" parameter indicates the number of client threads (default is 1); this example uses 10 threads.
7273

7374
## A simple Update/Read/Read Modify Write Test on MongoDB
7475

7576
To test the performance of executing a workload which includes running UPDATE, Read Modify Write(RMW) and/or READ operations on the data using 10 threads for example, use the following command:
7677

7778
```console
78-
./bin/ycsb load mongodb -s -P workloads/workloada -p mongodb.url=mongodb://localhost:27017/ycsb?w=0
79+
./bin/ycsb run mongodb -s -P workloads/workloada -p mongodb.url=mongodb://localhost:27017/ycsb?w=0 -threads 10
7980
```
8081

8182
The workloads/workloada file in this example sets the following values `readproportion=0.5` and `updateproportion=0.5` which means there is an even split between the number of READ and UPDATE operations performed. You can change the type of operations and the splits by providing your own workload parameter file.

content/learning-paths/servers-and-cloud-computing/mongodb/create_replica_set.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,33 +8,33 @@ weight: 3 # (intro is 1), 2 is first, 3 is second, etc.
88
layout: "learningpathall"
99
---
1010
## MongoDB test scenarios
11-
To test Mongodb you need two parts. A instance running the testing software([YCSB](/learning-paths/servers-and-cloud-computing/mongodb/benchmark_mongodb-8.0)). one or more instances running MongoDB in some configuration. The recommended MongoDB test setup is a three node relica set. These three nodes are of equal size with one instance being desigated as the primary node( the target for test traffic ) and the others as secondary nodes.
11+
To test MongoDB you need two parts. An instance running the testing software ([YCSB](/learning-paths/servers-and-cloud-computing/mongodb/benchmark_mongodb-8.0)). One or more instances running MongoDB in some configuration. The recommended MongoDB test setup is a three node replica set. These three nodes are of equal size with one instance being designated as the primary node (the target for test traffic) and the others as secondary nodes.
1212

1313
## What is a replica set?
1414

1515
A replica set is a group of instances that maintain the same dataset. A replica set contains many nodes, but three nodes are the most common for testing.
1616

1717
## What node size should I use?
1818

19-
The most common size for testing MongoDB is an 8 vCPU instance. You can test with any sized instance, but if you are looking for ideal testing conditions, 8 vCPUs is enough. Each node should have atleast 32GB of RAM.
19+
The most common size for testing MongoDB is an 8 vCPU instance. You can test with any sized instance, but if you are looking for ideal testing conditions, 8 vCPUs is enough. Each node should have at least 32GB of RAM.
2020

21-
To achieve the best results, its recommended to keep the complete data set in memory. If you see disk access when running tests, increase the RAM size of your instances. Additional details about the recommended configuration are provided below.
21+
To achieve the best results, it's recommended to keep the complete data set in memory. If you see disk access when running tests, increase the RAM size of your instances. Additional details about the recommended configuration are provided below.
2222

2323
## Creating replica sets
2424

25-
You can create replica sets of any size(two is the minimum). Three is recemmended but you can add as many as you like.
25+
You can create replica sets of any size (two is the minimum). Three is recommended but you can add as many as you like.
2626

2727
## Three node replica sets
2828

29-
To creating a three node replica set, start by launching three [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) of equal size.
29+
To create a three node replica set, start by launching three Arm-based instances of equal size.
3030

31-
[install](/learning-paths/servers-and-cloud-computing/mongodb/run_mongodb) Mongodb on all three instances.
31+
[Install](/learning-paths/servers-and-cloud-computing/mongodb/run_mongodb) MongoDB on all three instances.
3232

33-
Once all three instances are up and running. Modify the service and configuration file for all instances.
33+
Once all three instances are up and running, modify the service and configuration file for all instances.
3434

3535
## Modify the MongoDB configuration
3636

37-
Use a text editor to edit the file `/etc/mongodb.conf` and replace the contents of the file with the text below.
37+
Use a text editor to edit the file `/etc/mongod.conf` and replace the contents of the file with the text below.
3838

3939
```console
4040
# Configuration Options: https://docs.mongodb.org/manual/reference/configuration-options/
@@ -50,10 +50,10 @@ storage:
5050
engine: wiredTiger
5151
wiredTiger:
5252
engineConfig:
53-
configString: "cache_size=16484MB" # 50% of your ram is recommened. Adding more helps depending on dataset.
53+
configString: "cache_size=16484MB" # 50% of your ram is recommended. Adding more helps depending on dataset.
5454

5555
replication:
56-
replSetName: "rs0" # Name of your replicaset
56+
replSetName: "rs0" # Name of your replica set
5757
oplogSizeMB: 5000
5858

5959
# network interfaces
@@ -70,14 +70,14 @@ setParameter:
7070
tlsWithholdClientCertificate: true
7171
```
7272

73-
**Details of what all these mean is below:**
73+
**Details of what these mean are below:**
7474

7575
**systemLog:** Contains locations and details of where logging should be contained.
7676
- **path:** Location for logging
7777

78-
**storage:** Its recommended to run test within memory to get achieve the best performance. This contains details on the engine used and location of storage.
78+
**storage:** It's recommended to run test within memory to achieve the best performance. This contains details on the engine used and location of storage.
7979
- **engine:** Wiredtiger is used in this case. Using a disk will add latency.
80-
- **cache_size:** The minimum if using the recommend instance size is 50% of 32(16gb). But in testing using 18gb produced better results.
80+
- **cache_size:** The minimum if using the recommended instance size is 50% of 32(16gb). However, testing showed that using 18GB produced better results.
8181

8282
**replication:** This is used for replica set setup.
8383
- **replSetName:** This is the name of the replica set.
@@ -91,15 +91,15 @@ setParameter:
9191
- **diagnosticDataCollectionDirectorySizeMB:** 400 is based on the docs.
9292
- **honorSystemUmask:** Sets read and write permissions only to the owner of new files
9393
- **lockCodeSegmentsInMemory:** Locks code into memory and prevents it from being swapped.
94-
- **suppressNoTLSPeerCertificateWarning:** allows clients to connect without a certificate. (Only for testing purposes)
95-
- **tlsWithholdClientCertificate:** Will not send the certification during communication. (Only for testing purposes)
94+
- **suppressNoTLSPeerCertificateWarning:** Allows clients to connect without a certificate. (Only for testing purposes)
95+
- **tlsWithholdClientCertificate:** Will not send the certificate during communication. (Only for testing purposes)
9696

9797
If you want to use encryption you will need to add the security and keyFile to your configuration. As well as change some of the parameters in the `mongod.conf` file.
9898

99-
Run this command to reload the new configurtion.
99+
Run this command to reload the new configuration.
100100

101-
```
102-
sudo service mongod restart
101+
```bash
102+
sudo systemctl restart mongod
103103
```
104104

105105
## Modify the MongoDB service
@@ -136,15 +136,15 @@ LimitNPROC=64000
136136
WantedBy=multi-user.target
137137
```
138138

139-
details on these can be found here: https://docs.mongodb.com/manual/reference/ulimit/#recommended-ulimit-settings
139+
Details on these can be found in the [documentation](https://docs.mongodb.com/manual/reference/ulimit/#recommended-ulimit-settings).
140140

141141
Run this command to reload the service.
142142

143-
```
144-
sudo ystemctl daemon-reload
143+
```bash
144+
sudo systemctl daemon-reload
145145
```
146146

147-
**Once all three instances are created and have mongodb installed, select one to be your primary node. The remaining instances will be secondary nodes.**
147+
**Once all three instances are created and have MongoDB installed, select one to be your primary node. The remaining instances will be secondary nodes.**
148148

149149
## Initialize the replica set
150150

@@ -160,7 +160,7 @@ Connect to the primary node and run the following commands below.
160160

161161
2. Initialize the replica set with the following command:
162162

163-
```
163+
```bash
164164
mongosh --host $PRIMARY_NODE_IP:27017 <<EOF
165165
rs.initiate({
166166
_id: "rs0",

content/learning-paths/servers-and-cloud-computing/mongodb/perf_mongodb.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Install the appropriate run-time environment to be able to use the performance t
1818
{{< tab header="Ubuntu" >}}
1919
sudo apt install default-jre default-jdk -y
2020
{{< /tab >}}
21-
{{< tab header="RHE/Amazon" >}}
21+
{{< tab header="RHEL/Amazon Linux" >}}
2222
sudo yum install java-17-openjdk
2323
{{< /tab >}}
2424
{{< /tabpane >}}
@@ -33,37 +33,41 @@ On your instance running MongoDB (you may need to start a new terminal), clone t
3333
git clone https://github.com/idealo/mongodb-performance-test.git
3434
```
3535

36-
Now `cd` into the project folder and execute the `jar` file:
36+
Now change into the project folder and execute the JAR file to see its usage instructions:
3737

3838
```bash { ret_code="1" }
3939
cd mongodb-performance-test
4040
java -jar ./latest-version/mongodb-performance-test.jar
4141
```
42-
This will print a description of how to use the java application
42+
This will print a description of how to use the Java application.
4343

4444

4545
## Run Insert test
4646

4747
Run a test that inserts documents on `localhost:27017` (default).
4848

49-
Use the following options:
50-
* `-m` defines the test
51-
* `-o` defines the number of iterations
52-
* Alternatively, use `-d` to specify a time limit (in seconds)
53-
* `-t` defines the number of threads
54-
* `-db` defines the database to use
55-
* `-c` defines how the data is collected.
56-
57-
For example:
49+
First, set an environment variable for the JAR file path for convenience:
5850
```bash { cwd="./mongodb-performance-test" }
5951
export jarfile=./latest-version/mongodb-performance-test.jar
52+
```
53+
54+
Use the following options:
55+
* `-m` defines the test mode (e.g., `insert`, `update_one`).
56+
* `-o` defines the number of operations (iterations).
57+
* Alternatively, use `-d` to specify a duration limit (in seconds).
58+
* `-t` defines the number of threads.
59+
* `-db` defines the database to use.
60+
* `-c` defines the collection to use.
61+
62+
For example, run an insert test for 1 million operations using 10 threads:
63+
```bash { cwd="./mongodb-performance-test" }
6064
java -jar $jarfile -m insert -o 1000000 -t 10 -db test -c perf
6165
```
62-
As the test runs, the count will be printed periodically. It will increase until it reaches 1 million and then the test will end.
66+
As the test runs, the progress count will be printed periodically. It will increase until it reaches 1 million, and then the test will end.
6367

6468
## Run Update-one test
6569

66-
Similarly, to run this test, updating one document per query using 10, 20 and finally 30 threads for 1 hour each run (3 hours in total) run the following command:
70+
Similarly, to run an update test (updating one document per query) using 10, 20, and finally 30 threads for 1 hour each (3 hours total), run the following command:
6771

6872
```console
6973
java -jar $jarfile -m update_one -d 3600 -t 10 20 30 -db test -c perf
@@ -73,7 +77,7 @@ For instructions on running any other tests or more details on the metrics repor
7377

7478
## View the results
7579

76-
During each test, statistics over the last second are printed every second in the console. The following is the output from the end of running Insert test:
80+
During each test, statistics over the last second are printed to the console every second. After the test completes, final summary statistics are displayed. The following is example output from the end of the Insert test run:
7781

7882
``` output
7983
-- Timers ----------------------------------------------------------------------
@@ -96,4 +100,4 @@ stats-per-run-INSERT
96100
99.9% <= 15.59 milliseconds
97101
```
98102

99-
The metrics are also output to the `stats-per-second-[mode].csv` which is located in the same folder as the jar file. `[mode]` is the executed mode(s), i.e. either `INSERT`, `UPDATE_ONE`, `UPDATE_MANY`, `COUNT_ONE`, `COUNT_MANY`, `ITERATE_ONE`, `ITERATE_MANY`, `DELETE_ONE` or `DELETE_MANY`.
103+
Detailed per-second metrics are also output to a CSV file named `stats-per-second-[mode].csv` (e.g., `stats-per-second-INSERT.csv`), located in the same folder as the JAR file. `[mode]` corresponds to the executed mode(s), such as `INSERT`, `UPDATE_ONE`, `DELETE_ONE`, etc.

content/learning-paths/servers-and-cloud-computing/mongodb/replica_set_testing.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,17 @@
22
# User change
33
title: "Three node replica set testing with YCSB"
44

5-
65
weight: 5 # (intro is 1), 2 is first, 3 is second, etc.
76

87
# Do not modify these elements
98
layout: "learningpathall"
109
---
1110

12-
talk about which one to get on and how to run tests and see output
13-
1411
## Recommended Tests on MongoDB
1512

16-
The most common three tests are **95/5** (95% read and 5% update), **100/0** (100% read and 0% update) and **90/10** (90% read and 10% update). In real world testing its recommended to run a **95/5** test.
13+
The most common three tests are **95/5** (95% read and 5% update), **100/0** (100% read and 0% update) and **90/10** (90% read and 10% update). In real world testing it's recommended to run a **95/5** test.
1714

18-
Once you have loaded the dataset, run the selected test for approximately **five** minutes. This will allow the system to warm up before you start collecting performance data. The goal is to reach a high cpu utilization( 90+% ). Adjusting the number of threads, operationscount and recordscount can help you achieve this. Examples below maybe need to be adjusted based on the instance type you selected.
15+
Once you have loaded the dataset, run the selected test for approximately **five** minutes. This will allow the system to warm up before you start collecting performance data. The goal is to reach a high CPU utilization (90+%). Adjusting the number of threads, operationscount and recordscount can help you achieve this. Examples below may need to be adjusted based on the instance type you selected.
1916

2017
## Load dataset
2118

@@ -32,7 +29,7 @@ Once you have loaded the dataset, run the selected test for approximately **five
3229
## Run 100/0 test:
3330

3431
```console
35-
./bin/ycsb run mongodb -s -P workloads/workloadc -p mongodb.url=mongodb://Localhost:27017 -p minfieldlength=50 -p compressibility=2 -p maxexecutiontime=120 -threads 64 -p operationcount=40000000 -p recordcount=20000000 -p requestdistribution=zipfian -p readproportion=1.0 -p updateproportion=0.0
32+
./bin/ycsb run mongodb -s -P workloads/workloadc -p mongodb.url=mongodb://localhost:27017 -p minfieldlength=50 -p compressibility=2 -p maxexecutiontime=120 -threads 64 -p operationcount=40000000 -p recordcount=20000000 -p requestdistribution=zipfian -p readproportion=1.0 -p updateproportion=0.0
3633
```
3734

3835
## Run 90/10 test:

0 commit comments

Comments
 (0)