Skip to content

Commit 321a1cc

Browse files
Support for CBOR update queries (#1148)
1 parent b485763 commit 321a1cc

28 files changed

+1743
-41
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66

77
## [Unreleased]
88
### Added
9+
- CBOR formatted update requests
910

1011
### Changed
1112

composer.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
"require-dev": {
2525
"ext-curl": "*",
2626
"ext-iconv": "*",
27+
"2tvenom/cborencode": "^1.0",
2728
"escapestudios/symfony2-coding-standard": "^3.11",
2829
"nyholm/psr7": "^1.8",
2930
"php-http/guzzle7-adapter": "^1.0",
@@ -34,8 +35,12 @@
3435
"phpunit/phpunit": "^10.5",
3536
"rawr/phpunit-data-provider": "^3.3",
3637
"roave/security-advisories": "dev-master",
38+
"spomky-labs/cbor-php": "^3.1",
3739
"symfony/event-dispatcher": "^5.0 || ^6.0 || ^7.0"
3840
},
41+
"suggest": {
42+
"spomky-labs/cbor-php": "Needed to use CBOR formatted requests with Solr 9.3+"
43+
},
3944
"prefer-stable": true,
4045
"config": {
4146
"sort-packages": true,

docs/plugins.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This can be done very easily with this plugin, you can simply keep feeding docum
1414
### Some notes
1515

1616
- Solarium issues JSON formatted update requests by default. If you require XML specific functionality, you can set the request format to XML on the plugin instance. XML requests are slower than JSON.
17+
- Solr 9.3 and higher also supports CBOR formatted update requests. You can set the request format to CBOR on the plugin instance if your documents adhere to [current limitations](queries/update-query/best-practices-for-updates.md#known-cbor-limitations).
1718
- You can set a custom buffer size. The default is 100 documents, a safe value. By increasing this you can get even better performance, but depending on your document size at some level you will run into memory or request limits. A value of 1000 has been successfully used for indexing 200k documents.
1819
- You can use the createDocument method with array input, but you can also manually create document instance and use the addDocument(s) method.
1920
- With buffer size X an update request will be sent to Solr for each X docs. You can just keep feeding docs. These buffer flushes don’t include a commit. This is done on purpose. You can add a commit when you’re done, or you can use the Solr auto commit feature.

docs/queries/update-query/best-practices-for-updates.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,31 @@ $update = $client->createUpdate();
4040
$update->setRequestFormat($update::REQUEST_FORMAT_XML);
4141
```
4242

43-
### Raw XML update commands
43+
#### Raw XML update commands
4444

4545
Solarium makes it easy to build update commands without having to know the underlying XML structure. If you already have XML formatted update commands, you can add them directly to an update query. Make sure they are valid as Solarium will not check this, and set the [XML request format](#xml-vs-json-formatted-update-requests) on the update query.
46+
47+
### CBOR formatted update requests
48+
49+
Since Solr 9.3, Solr also supports the [CBOR format for indexing](https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-cbor.html). While CBOR requests might be faster to handle by Solr, they are significantly slower and require more memory to build in Solarium than JSON or XML requests. Benchmark your own use cases to determine if this is the right choice for you.
50+
51+
In order to use CBOR with Solarium, you need to install the `spomky-labs/cbor-php` library separately.
52+
53+
```sh
54+
composer require spomky-labs/cbor-php
55+
```
56+
57+
```php
58+
// get an update query instance
59+
$update = $client->createUpdate();
60+
61+
// set CBOR request format
62+
$update->setRequestFormat($update::REQUEST_FORMAT_CBOR);
63+
```
64+
65+
#### Known CBOR limitations
66+
67+
As outlined in [SOLR-17510](https://issues.apache.org/jira/browse/SOLR-17510?focusedCommentId=17892000#comment-17892000), CBOR formatted updates currently have some limitations.
68+
69+
- You can only add documents, other commands such as delete and commit aren't supported yet.
70+
- There is no support for atomic updates.

docs/queries/update-query/building-an-update-query/building-an-update-query.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
An update query has options and commands. These commands and options are instructions for the client classes to build and execute a request and return the correct result. In the following sections both the options and commands will be discussed in detail.
2-
You can also take a look at the [XML](https://solr.apache.org/guide/uploading-data-with-index-handlers.html#xml-formatted-index-updates) or [JSON](https://solr.apache.org/guide/uploading-data-with-index-handlers.html#json-formatted-index-updates) request formats for more information about the underlying Solr update handler.
2+
You can also take a look at the [XML](https://solr.apache.org/guide/uploading-data-with-index-handlers.html#xml-formatted-index-updates), [JSON](https://solr.apache.org/guide/uploading-data-with-index-handlers.html#json-formatted-index-updates), or [CBOR](https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-cbor.html) request formats for more information about the underlying Solr update handler.
33

44
Options
55
-------
@@ -10,7 +10,7 @@ However, if you do need to customize them for a special case, you can.
1010

1111
### RequestFormat
1212

13-
Solarium issues JSON formatted update requests by default. Set this to XML if you require XML specific functionality.
13+
Solarium issues JSON formatted update requests by default. Set this to XML if you require XML specific functionality. You can also set this to CBOR if you use Solr 9.3 or higher and your use case falls within [current limitations](../best-practices-for-updates.md#known-cbor-limitations).
1414

1515
### ResultClass
1616

docs/queries/update-query/update-query.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ Update queries allow you to add, delete, commit, optimize and rollback commands.
55
- Always use a database or other persistent storage as the source for building documents to add. Don't be tempted to emulate an update command by selecting a document, altering it and adding it. Almost all schemas will have fields that are indexed and not stored. You will lose the data in those fields.
66
- The best way to use update queries is also related to your Solr config. If you are for instance using the autocommit feature of Solr you probably don't want to use a commit command in your update queries. Make sure you know the configuration details of the Solr core you use.
77
- Some functionality is only available with XML formatted or JSON formatted update queries, but not both. Set the appropriate request format if necessary.
8+
- Solr 9.3 and higher also supports CBOR formatted update queries. Be aware that there are some [current limitations](best-practices-for-updates.md#known-cbor-limitations) with this request format.

examples/7.5.3-plugin-bufferedupdate-benchmarks.php

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,31 @@
22

33
require_once(__DIR__.'/init.php');
44

5+
use Composer\InstalledVersions;
56
use Solarium\Core\Client\Adapter\TimeoutAwareInterface;
67
use Solarium\Core\Client\Request;
8+
use Solarium\QueryType\Update\Query\Query;
79

810
set_time_limit(0);
9-
ini_set('memory_limit', ini_get('suhosin.memory_limit') ?: '-1');
11+
ini_set('memory_limit', -1);
1012
ob_implicit_flush(true);
1113
@ob_end_flush();
1214

1315
htmlHeader();
1416

15-
if (!isset($weight) || !isset($requestFormat)) {
17+
if (!isset($weight) || !isset($addRequestFormat) || !isset($delRequestFormat)) {
1618
echo <<<'EOT'
1719
<h1>Usage</h1>
1820
19-
<p>This file is intended to be included by a script that sets two variables:</p>
21+
<p>This file is intended to be included by a script that sets three variables:</p>
2022
2123
<dl>
2224
<dt><code>$weight</code></dt>
2325
<dd>Either <code>''</code> for the regular plugins or <code>'lite'</code> for the lite versions.</dd>
24-
<dt><code>$requestFormat</code></dt>
26+
<dt><code>$addRequestFormat</code></dt>
2527
<dd>Any of the <code>Solarium\QueryType\Update\Query\Query::REQUEST_FORMAT_*</code> constants.</dd>
28+
<dt><code>$delRequestFormat</code></dt>
29+
<dd><code>Solarium\QueryType\Update\Query\Query::REQUEST_FORMAT_JSON</code> or <code>REQUEST_FORMAT_XML</code>.</dd>
2630
</dl>
2731
2832
<h2>Example</h2>
@@ -35,7 +39,8 @@
3539
use Solarium\QueryType\Update\Query\Query;
3640
3741
$weight = '';
38-
$requestFormat = Query::REQUEST_FORMAT_JSON;
42+
$addRequestFormat = Query::REQUEST_FORMAT_JSON;
43+
$delRequestFormat = Query::REQUEST_FORMAT_JSON;
3944
4045
require(__DIR__.'/7.5.3-plugin-bufferedupdate-benchmarks.php');
4146
</pre>
@@ -46,6 +51,14 @@
4651
exit;
4752
}
4853

54+
if (in_array(Query::REQUEST_FORMAT_CBOR, [$addRequestFormat, $delRequestFormat]) && !InstalledVersions::isInstalled('spomky-labs/cbor-php')) {
55+
echo '<h2>Note: The CBOR benchmark requires spomky-labs/cbor-php</h2>';
56+
57+
htmlFooter();
58+
59+
exit;
60+
}
61+
4962
echo '<h2>Note: These benchmarks can take some time to run!</h2>';
5063

5164
// create a client instance and don't let the adapter timeout
@@ -76,10 +89,10 @@
7689
$addBuffer = $client->getPlugin($addPlugin = 'bufferedadd'.$weight);
7790
$delBuffer = $client->getPlugin($delPlugin = 'buffereddelete'.$weight);
7891

79-
$addBuffer->setRequestFormat($requestFormat);
80-
$delBuffer->setRequestFormat($requestFormat);
92+
$addBuffer->setRequestFormat($addRequestFormat);
93+
$delBuffer->setRequestFormat($delRequestFormat);
8194

82-
echo '<h3>'.$addPlugin.' / '.$delPlugin.' ('.strtoupper($requestFormat).')</h3>';
95+
echo '<h3>'.$addPlugin.' ('.strtoupper($addRequestFormat).') / '.$delPlugin.' ('.strtoupper($delRequestFormat).')</h3>';
8396
echo '<table><thead>';
8497
echo '<tr><th>add buffer size</th><th>add time</th>';
8598
echo '<th>delete buffer size</th><th>delete time</th>';
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
<?php
2+
3+
require_once(__DIR__.'/init.php');
4+
5+
use Composer\InstalledVersions;
6+
use Solarium\Core\Client\Response;
7+
use Solarium\Core\Event\Events;
8+
use Solarium\Core\Event\PreExecuteRequest;
9+
use Solarium\QueryType\Update\Query\Query;
10+
11+
set_time_limit(0);
12+
ini_set('memory_limit', -1);
13+
ob_implicit_flush(true);
14+
@ob_end_flush();
15+
16+
htmlHeader();
17+
18+
echo '<h2>Note: These benchmarks build the requests but don\'t execute them</h2>';
19+
20+
$requestFormats = [
21+
Query::REQUEST_FORMAT_XML,
22+
Query::REQUEST_FORMAT_JSON,
23+
];
24+
25+
if (InstalledVersions::isInstalled('spomky-labs/cbor-php')) {
26+
$requestFormats[] = Query::REQUEST_FORMAT_CBOR;
27+
} else {
28+
echo '<h2>Note: The CBOR benchmark requires spomky-labs/cbor-php</h2>';
29+
}
30+
31+
// memory usage is only useful with PHP 8.2+, earlier version don't allow resetting between benchmarks
32+
$withMemoryUsage = function_exists('memory_reset_peak_usage');
33+
34+
// create a client instance
35+
$client = new Solarium\Client($adapter, $eventDispatcher, $config);
36+
37+
// autoload the buffered add plugin
38+
$addBuffer = $client->getPlugin('bufferedaddlite');
39+
40+
// return a dummy response instead of executing the query
41+
$response = new Response('', ['HTTP/1.0 200 OK']);
42+
43+
// keep measures for individual build times
44+
$start = 0;
45+
$buildTimes = [];
46+
47+
$client->getEventDispatcher()->addListener(
48+
Events::PRE_EXECUTE_REQUEST,
49+
function (PreExecuteRequest $event) use ($response, &$start, &$buildTimes) {
50+
$buildTimes[] = (hrtime(true) - $start) / 1000000;
51+
$event->setResponse($response);
52+
$start = hrtime(true);
53+
}
54+
);
55+
56+
$docs = 1200000;
57+
58+
foreach ($requestFormats as $requestFormat) {
59+
$addBuffer->setRequestFormat($requestFormat);
60+
61+
echo '<h3>'.strtoupper($requestFormat).'</h3>';
62+
echo '<table><thead>';
63+
echo '<tr><th rowspan="2">buffer size</th><th colspan="5">build time</th>';
64+
if ($withMemoryUsage) {
65+
echo '<th rowspan="2">mem peak usage</th>';
66+
}
67+
echo '</tr>';
68+
echo '<tr><th>min</th><th>max</th><th>mean</th><th>median</th><th>total</th></tr>';
69+
echo '</thead><tbody style="text-align:right">';
70+
71+
foreach ([2000, 200, 20, 2] as $flushes) {
72+
$bufferSize = $docs / $flushes;
73+
74+
$addBuffer->setBufferSize($bufferSize);
75+
76+
echo '<tr><td>'.$bufferSize.'</td>';
77+
78+
$buildTimes = [];
79+
80+
if ($withMemoryUsage) {
81+
memory_reset_peak_usage();
82+
}
83+
84+
$start = hrtime(true);
85+
86+
for ($i = 0; $i < $docs; ++$i) {
87+
$data = [
88+
'id' => sprintf('test-%08d', $i),
89+
'name' => 'test for buffered add',
90+
'cat' => ['solarium-test', 'solarium-test-bufferedadd'],
91+
];
92+
$addBuffer->createDocument($data);
93+
}
94+
95+
sort($buildTimes);
96+
$halfway = $flushes / 2;
97+
$total = array_sum($buildTimes);
98+
$min = reset($buildTimes);
99+
$max = end($buildTimes);
100+
$mean = $total / $flushes;
101+
$median = ($buildTimes[$halfway - 1] + $buildTimes[$halfway]) / 2;
102+
echo '<td>'.(int) $min.' ms</td>';
103+
echo '<td>'.(int) $max.' ms</td>';
104+
echo '<td>'.(int) $mean.' ms</td>';
105+
echo '<td>'.(int) $median.' ms</td>';
106+
echo '<td>'.(int) $total.' ms</td>';
107+
108+
if ($withMemoryUsage) {
109+
$memoryPeakUsage = memory_get_peak_usage() / 1024;
110+
echo '<td>'.(int) $memoryPeakUsage.' KiB</td>';
111+
}
112+
113+
echo '</tr>';
114+
}
115+
116+
echo '</tbody></table>';
117+
}
118+
119+
htmlFooter();

examples/7.5.3.1-plugin-bufferedupdate-benchmarks-xml.php

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
use Solarium\QueryType\Update\Query\Query;
66

77
$weight = '';
8-
$requestFormat = Query::REQUEST_FORMAT_XML;
8+
$addRequestFormat = Query::REQUEST_FORMAT_XML;
9+
$delRequestFormat = Query::REQUEST_FORMAT_XML;
910

1011
require(__DIR__.'/7.5.3-plugin-bufferedupdate-benchmarks.php');

examples/7.5.3.2-plugin-bufferedupdate-lite-benchmarks-xml.php

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
use Solarium\QueryType\Update\Query\Query;
66

77
$weight = 'lite';
8-
$requestFormat = Query::REQUEST_FORMAT_XML;
8+
$addRequestFormat = Query::REQUEST_FORMAT_XML;
9+
$delRequestFormat = Query::REQUEST_FORMAT_XML;
910

1011
require(__DIR__.'/7.5.3-plugin-bufferedupdate-benchmarks.php');

0 commit comments

Comments
 (0)