Skip to content

Commit 1fb817c

Browse files
authored
Benchmarks docs, config and core reorganisation (#9)
* Cleanup benchmarks list in README * Move mixed simulation parameter under the proper section * Remove the updates generation parameters This commit changes the way update requests are generated. They used to be generated up to a certain number. Now as many updates as the user wants can be generated, using infinite streams. This not only simplifies the configuration of update-related benchmarks but also reduce the amount of boilerplate code. Typically the `BufferedRandomIterator` class was needed to shuffle updates, as otherwise, multiple updates against the same entity would be sent simultaneously. Now, before an entity is updated again, all the other entities of the dataset need to receive an update as well. This removes the need for the shuffling and buffering operation. As a result, there are no random operation that are performed anymore in the simulations. The seed parameter is not necessary anymore. It has been removed as it is dead code. * Fix misleading throughput parameters The throughput of the 100% write and 100% read benchmarks cannot be configured. Only the number of concurrent users can. Those benchmark are designed with a closed model [1] so that the number of requests injection rate depends on the response rate, and so that the requests do not queue up on the server side. This commit renames the parameters and fixes the documentation accordingly, for clarity.
1 parent 22449d1 commit 1fb817c

File tree

11 files changed

+91
-143
lines changed

11 files changed

+91
-143
lines changed

benchmarks/README.md

Lines changed: 9 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,30 +23,9 @@ Benchmarks for the Polaris service using Gatling.
2323

2424
## Available Benchmarks
2525

26-
### Dataset Creation Benchmark
27-
28-
The CreateTreeDataset benchmark creates a test dataset with a specific structure:
29-
30-
- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates up to 50 entities simultaneously
31-
32-
This is a write-only workload designed to populate the system for subsequent benchmarks.
33-
34-
### Read/Update Benchmark
35-
36-
The ReadUpdateTreeDataset benchmark tests read and update operations on an existing dataset:
37-
38-
- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs up to 20 read/update operations simultaneously
39-
40-
This benchmark can only be run after using CreateTreeDataset to populate the system.
41-
42-
### Read-Only Benchmark
43-
44-
The ReadTreeDataset benchmark is a 100% read workload that fetches a tree dataset in Polaris:
45-
46-
- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs read-only operations to verify namespaces, tables, and views
47-
48-
This benchmark is intended to be used against a Polaris instance with a pre-existing tree dataset. It has no side effects on the dataset and can be executed multiple times without any issues.
49-
26+
- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates a test dataset with a specific structure. It is a write-only workload designed to populate the system for subsequent benchmarks.
27+
- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs read-only operations to fetch namespaces, tables, and views. Some attributes of the objects are also fetched. This benchmark is intended to be used against a Polaris instance with a pre-existing tree dataset. It has no side effects on the dataset and can be executed multiple times without any issues.
28+
- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs read and update operations against a Polaris instance populated with a test dataset. It is a read/write workload that can be used to test the system's ability to handle concurrent read and update operations. It is not destructive and does not prevent subsequent executions of `ReadTreeDataset` or `ReadUpdateTreeDataset`.
5029

5130
## Parameters
5231

@@ -95,7 +74,9 @@ Workload settings are configured under `workload`:
9574

9675
```hocon
9776
workload {
98-
read-write-ratio = 0.8 # Ratio of reads (0.0-1.0)
77+
read-update-tree-dataset {
78+
read-write-ratio = 0.8 # Ratio of reads (0.0-1.0)
79+
}
9980
}
10081
```
10182

@@ -117,7 +98,9 @@ http {
11798
}
11899
119100
workload {
120-
read-write-ratio = 0.8
101+
read-update-tree-dataset {
102+
read-write-ratio = 0.8 # Ratio of reads (0.0-1.0)
103+
}
121104
}
122105
```
123106

benchmarks/src/gatling/resources/benchmark-defaults.conf

Lines changed: 20 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -117,55 +117,42 @@ dataset.tree {
117117

118118
# Workload configuration
119119
workload {
120-
# Ratio of read operations to write operations
121-
# Range: 0.0 to 1.0 where:
122-
# - 0.0 means 100% writes
123-
# - 1.0 means 100% reads
124-
# Example: 0.8 means 80% reads and 20% writes
125-
# Required: Must be provided through environment variable READ_WRITE_RATIO
126-
read-write-ratio = 0.5
127-
128-
# Seed used for random number generation
129-
# Default: 1
130-
seed = 1
131-
132-
# Number of property updates to perform per individual namespace
133-
# Default: 5
134-
updates-per-namespace = 5
135-
136-
# Number of property updates to perform per individual table
137-
# Default: 10
138-
updates-per-table = 10
139-
140-
# Number of property updates to perform per individual view
141-
# Default: 10
142-
updates-per-view = 10
143-
144-
145120
# Configuration for the ReadTreeDataset simulation
146121
read-tree-dataset {
147-
# Number of table operations to perform per second
122+
# Number of table operations to perform simultaneously
123+
# This controls the concurrency level for table operations
148124
# Default: 20
149-
table-throughput = 20
125+
table-concurrency = 20
150126

151-
# Number of view operations to perform per second
127+
# Number of view operations to perform simultaneously
128+
# This controls the concurrency level for view operations
152129
# Default: 10
153-
view-throughput = 10
130+
view-concurrency = 10
154131
}
155132

156133
# Configuration for the CreateTreeDataset simulation
157134
create-tree-dataset {
158-
# Number of table operations to perform per second
135+
# Number of table operations to perform simultaneously
136+
# This controls the concurrency level for table operations
159137
# Default: 20
160-
table-throughput = 20
138+
table-concurrency = 20
161139

162-
# Number of view operations to perform per second
140+
# Number of view operations to perform simultaneously
141+
# This controls the concurrency level for view operations
163142
# Default: 10
164-
view-throughput = 10
143+
view-concurrency = 10
165144
}
166145

167146
# Configuration for the ReadUpdateTreeDataset simulation
168147
read-update-tree-dataset {
148+
# Ratio of read operations to write operations
149+
# Range: 0.0 to 1.0 where:
150+
# - 0.0 means 100% writes
151+
# - 1.0 means 100% reads
152+
# Example: 0.8 means 80% reads and 20% writes
153+
# Default: 0.5
154+
read-write-ratio = 0.5
155+
169156
# Number of operations to perform per second
170157
# Default: 100
171158
throughput = 100

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -115,15 +115,24 @@ case class NamespaceActions(
115115
)
116116
}
117117

118-
def namespacePropertiesUpdateFeeder(): Feeder[Any] = namespaceIdentityFeeder()
119-
.flatMap { row =>
120-
(0 until wp.updatesPerNamespace).map { updateId =>
121-
val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId")
122-
row ++ Map(
123-
"jsonPropertyUpdates" -> Json.toJson(updates).toString()
124-
)
125-
}
126-
}
118+
/**
119+
* Creates a Gatling Feeder that generates namespace property updates. Each row contains a single
120+
* property update targeting a specific namespace. The feeder is infinite, in that it will
121+
* generate a new property update every time.
122+
*
123+
* @return An iterator providing namespace property update details
124+
*/
125+
def namespacePropertiesUpdateFeeder(): Feeder[Any] = Iterator
126+
.from(0)
127+
.flatMap(updateId =>
128+
namespaceIdentityFeeder()
129+
.map { row =>
130+
val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId")
131+
row ++ Map(
132+
"jsonPropertyUpdates" -> Json.toJson(updates).toString()
133+
)
134+
}
135+
)
127136

128137
/**
129138
* Creates a new namespace in a specified catalog. The namespace is created with a full path and

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -107,14 +107,18 @@ case class TableActions(
107107

108108
/**
109109
* Creates a Gatling Feeder that generates table property updates. Each row contains a single
110-
* property update targeting a specific table.
110+
* property update targeting a specific table. The feeder is infinite, in that it will generate a
111+
* new property update every time.
111112
*
112113
* @return An iterator providing table property update details
113114
*/
114-
def propertyUpdateFeeder(): Feeder[Any] = tableIdentityFeeder()
115-
.flatMap(row =>
116-
Range(0, wp.updatesPerTable)
117-
.map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": "NewValue_$k"}"""))
115+
def propertyUpdateFeeder(): Feeder[Any] = Iterator
116+
.from(0)
117+
.flatMap(updateId =>
118+
tableIdentityFeeder()
119+
.map { row =>
120+
row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": "NewValue_$updateId"}""")
121+
}
118122
)
119123

120124
/**

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -99,14 +99,18 @@ case class ViewActions(
9999

100100
/**
101101
* Creates a Gatling Feeder that generates view property updates. Each row contains a single
102-
* property update targeting a specific view.
102+
* property update targeting a specific view. The feeder is infinite, in that it will generate a
103+
* new property update every time.
103104
*
104105
* @return An iterator providing view property update details
105106
*/
106-
def propertyUpdateFeeder(): Feeder[Any] = viewIdentityFeeder()
107-
.flatMap(row =>
108-
Range(0, wp.updatesPerView)
109-
.map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": "NewValue_$k"}"""))
107+
def propertyUpdateFeeder(): Feeder[Any] = Iterator
108+
.from(0)
109+
.flatMap(updateId =>
110+
viewIdentityFeeder()
111+
.map { row =>
112+
row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": "NewValue_$updateId"}""")
113+
}
110114
)
111115

112116
/**

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,20 +44,16 @@ object BenchmarkConfig {
4444
val rutdConfig = workload.getConfig("read-update-tree-dataset")
4545

4646
WorkloadParameters(
47-
workload.getDouble("read-write-ratio"),
48-
workload.getInt("updates-per-namespace"),
49-
workload.getInt("updates-per-table"),
50-
workload.getInt("updates-per-view"),
51-
workload.getLong("seed"),
5247
ReadTreeDatasetParameters(
53-
rtdConfig.getInt("table-throughput"),
54-
rtdConfig.getInt("view-throughput")
48+
rtdConfig.getInt("table-concurrency"),
49+
rtdConfig.getInt("view-concurrency")
5550
),
5651
CreateTreeDatasetParameters(
57-
ctdConfig.getInt("table-throughput"),
58-
ctdConfig.getInt("view-throughput")
52+
ctdConfig.getInt("table-concurrency"),
53+
ctdConfig.getInt("view-concurrency")
5954
),
6055
ReadUpdateTreeDatasetParameters(
56+
rutdConfig.getDouble("read-write-ratio"),
6157
rutdConfig.getInt("throughput"),
6258
rutdConfig.getInt("duration-in-minutes")
6359
)

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,22 @@ package org.apache.polaris.benchmarks.parameters
2222
/**
2323
* Case class to hold the parameters for the ReadUpdateTreeDataset simulation.
2424
*
25+
* @param readWriteRatio The ratio of read operations to write operations (0.0-1.0).
2526
* @param throughput The number of operations to perform per second.
2627
* @param durationInMinutes The duration of the simulation in minutes.
2728
*/
2829
case class ReadUpdateTreeDatasetParameters(
30+
readWriteRatio: Double,
2931
throughput: Int,
3032
durationInMinutes: Int
3133
) {
34+
require(
35+
readWriteRatio >= 0.0 && readWriteRatio <= 1.0,
36+
"Read/write ratio must be between 0.0 and 1.0 inclusive"
37+
)
3238
require(throughput >= 0, "Throughput cannot be negative")
3339
require(durationInMinutes > 0, "Duration in minutes must be positive")
40+
41+
val gatlingReadRatio: Double = readWriteRatio * 100
42+
val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100
3443
}

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -20,35 +20,7 @@
2020
package org.apache.polaris.benchmarks.parameters
2121

2222
case class WorkloadParameters(
23-
readWriteRatio: Double,
24-
updatesPerNamespace: Int,
25-
updatesPerTable: Int,
26-
updatesPerView: Int,
27-
seed: Long,
2823
readTreeDataset: ReadTreeDatasetParameters,
2924
createTreeDataset: CreateTreeDatasetParameters,
3025
readUpdateTreeDataset: ReadUpdateTreeDatasetParameters
31-
) {
32-
require(
33-
readWriteRatio >= 0.0 && readWriteRatio <= 1.0,
34-
"Read/write ratio must be between 0.0 and 1.0 inclusive"
35-
)
36-
37-
require(
38-
updatesPerNamespace >= 0,
39-
"Updates per namespace must be non-negative"
40-
)
41-
42-
require(
43-
updatesPerTable >= 0,
44-
"Updates per table must be non-negative"
45-
)
46-
47-
require(
48-
updatesPerView >= 0,
49-
"Updates per view must be non-negative"
50-
)
51-
52-
val gatlingReadRatio: Double = readWriteRatio * 100
53-
val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100
54-
}
26+
) {}

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ import scala.concurrent.duration.DurationInt
3333
/**
3434
* This simulation is a 100% read workload that fetches a tree dataset in Polaris. It is intended to
3535
* be used against a Polaris instance with a pre-existing tree dataset. It has no side effect on the
36-
* dataset and therefore can be executed multiple times without any issue.
36+
* dataset and therefore can be executed multiple times without any issue. It fetches each entity
37+
* exactly once.
3738
*/
3839
class ReadTreeDataset extends Simulation {
3940
private val logger = LoggerFactory.getLogger(getClass)

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ import scala.concurrent.duration._
3737

3838
/**
3939
* This simulation tests read and update operations on an existing dataset.
40+
*
41+
* The ratio of read operations to write operations is controlled by the readWriteRatio parameter in
42+
* the ReadUpdateTreeDatasetParameters.
4043
*/
4144
class ReadUpdateTreeDataset extends Simulation {
4245
private val logger = LoggerFactory.getLogger(getClass)
@@ -91,17 +94,17 @@ class ReadUpdateTreeDataset extends Simulation {
9194
private val nsListFeeder = new CircularIterator(nsActions.namespaceIdentityFeeder)
9295
private val nsExistsFeeder = new CircularIterator(nsActions.namespaceIdentityFeeder)
9396
private val nsFetchFeeder = new CircularIterator(nsActions.namespaceFetchFeeder)
94-
private val nsUpdateFeeder = new CircularIterator(nsActions.namespacePropertiesUpdateFeeder)
97+
private val nsUpdateFeeder = nsActions.namespacePropertiesUpdateFeeder()
9598

9699
private val tblListFeeder = new CircularIterator(tblActions.tableIdentityFeeder)
97100
private val tblExistsFeeder = new CircularIterator(tblActions.tableIdentityFeeder)
98101
private val tblFetchFeeder = new CircularIterator(tblActions.tableFetchFeeder)
99-
private val tblUpdateFeeder = new CircularIterator(tblActions.propertyUpdateFeeder)
102+
private val tblUpdateFeeder = tblActions.propertyUpdateFeeder()
100103

101104
private val viewListFeeder = new CircularIterator(viewActions.viewIdentityFeeder)
102105
private val viewExistsFeeder = new CircularIterator(viewActions.viewIdentityFeeder)
103106
private val viewFetchFeeder = new CircularIterator(viewActions.viewFetchFeeder)
104-
private val viewUpdateFeeder = new CircularIterator(viewActions.propertyUpdateFeeder)
107+
private val viewUpdateFeeder = viewActions.propertyUpdateFeeder()
105108

106109
// --------------------------------------------------------------------------------
107110
// Workload: Randomly read and write entities
@@ -110,7 +113,7 @@ class ReadUpdateTreeDataset extends Simulation {
110113
scenario("Read and write entities using the Iceberg REST API")
111114
.exec(authActions.restoreAccessTokenInSession)
112115
.randomSwitch(
113-
wp.gatlingReadRatio -> group("Read")(
116+
wp.readUpdateTreeDataset.gatlingReadRatio -> group("Read")(
114117
uniformRandomSwitch(
115118
exec(feed(nsListFeeder).exec(nsActions.fetchAllChildrenNamespaces)),
116119
exec(feed(nsExistsFeeder).exec(nsActions.checkNamespaceExists)),
@@ -123,7 +126,7 @@ class ReadUpdateTreeDataset extends Simulation {
123126
exec(feed(viewFetchFeeder).exec(viewActions.fetchView))
124127
)
125128
),
126-
wp.gatlingWriteRatio -> group("Write")(
129+
wp.readUpdateTreeDataset.gatlingWriteRatio -> group("Write")(
127130
uniformRandomSwitch(
128131
exec(feed(nsUpdateFeeder).exec(nsActions.updateNamespaceProperties)),
129132
exec(feed(tblUpdateFeeder).exec(tblActions.updateTable)),

0 commit comments

Comments
 (0)