Skip to content

Commit afd394b

Browse files
authored
DOC-606 | SEARCH parallelism (#340)
* SEARCH parallelism * Mention new default parallelism startup option * Add metric
1 parent d892af7 commit afd394b

File tree

5 files changed

+108
-32
lines changed

5 files changed

+108
-32
lines changed

site/content/3.12/aql/high-level-operations/for.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Also see [Combining queries with subqueries](../fundamentals/subqueries.md).
9393
## Options
9494

9595
For collections and Views, the `FOR` construct supports an optional `OPTIONS`
96-
clause to modify behavior. The general syntax is:
96+
clause to modify the behavior. The general syntax is as follows:
9797

9898
<pre><code>FOR <em>variableName</em> IN <em>expression</em> OPTIONS { <em>option</em>: <em>value</em>, <em>...</em> }</code></pre>
9999

site/content/3.12/aql/high-level-operations/search.md

Lines changed: 74 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ You can use the special `includeAllFields`
237237
[`arangosearch` View property](../../index-and-search/arangosearch/arangosearch-views-reference.md#link-properties)
238238
to index all (sub-)attributes of the source documents if desired.
239239

240-
## SEARCH with SORT
240+
## `SEARCH` with `SORT`
241241

242242
The documents emitted from a View can be sorted by attribute values with the
243243
standard [SORT() operation](sort.md), using one or multiple
@@ -283,38 +283,83 @@ a score of `0` will be returned for all documents.
283283

284284
## Search Options
285285

286-
The `SEARCH` operation accepts an options object with the following attributes:
287-
288-
- `collections` (array, _optional_): array of strings with collection names to
289-
restrict the search to certain source collections
290-
- `conditionOptimization` (string, _optional_): controls how search criteria
291-
get optimized. Possible values:
292-
- `"auto"` (default): convert conditions to disjunctive normal form (DNF) and
293-
apply optimizations. Removes redundant or overlapping conditions, but can
294-
take quite some time even for a low number of nested conditions.
295-
- `"none"`: search the index without optimizing the conditions.
296-
<!-- Internal only: nodnf, noneg -->
297-
- `countApproximate` (string, _optional_): controls how the total count of rows
298-
is calculated if the `fullCount` option is enabled for a query or when
299-
a `COLLECT WITH COUNT` clause is executed
300-
- `"exact"` (default): rows are actually enumerated for a precise count.
301-
- `"cost"`: a cost-based approximation is used. Does not enumerate rows and
302-
returns an approximate result with O(1) complexity. Gives a precise result
303-
if the `SEARCH` condition is empty or if it contains a single term query
304-
only (e.g. `SEARCH doc.field == "value"`), the usual eventual consistency
305-
of Views aside.
306-
307-
**Examples**
308-
309-
Given a View with three linked collections `coll1`, `coll2` and `coll3` it is
310-
possible to return documents from the first two collections only and ignore the
311-
third using the `collections` option:
286+
The `SEARCH` operation supports an optional `OPTIONS` clause to modify the
287+
behavior. The general syntax is as follows:
288+
289+
<pre><code>SEARCH <em>expression</em> OPTIONS { <em>option</em>: <em>value</em>, <em>...</em> }</code></pre>
290+
291+
### `collections`
292+
293+
You can specify an array of strings with collection names to restrict the search
294+
to certain source collections.
295+
296+
Given a View with three linked collections `coll1`, `coll2`, and `coll3`, you
297+
can return documents from the first two collections only and ignore the third
298+
collection by setting the `collections` option to `["coll1", "coll2"]`:
312299

313300
```aql
314301
FOR doc IN viewName
315302
SEARCH true OPTIONS { collections: ["coll1", "coll2"] }
316303
RETURN doc
317304
```
318305

319-
The search expression `true` matches all View documents. You can use any valid
320-
expression here while limiting the scope to the chosen source collections.
306+
The search expression `true` in the above example matches all View documents.
307+
You can use any valid expression here while limiting the scope to the chosen
308+
source collections.
309+
310+
### `conditionOptimization`
311+
312+
You can specify one of the following values for this option to control how
313+
search criteria get optimized:
314+
315+
- `"auto"` (default): convert conditions to disjunctive normal form (DNF) and
316+
apply optimizations. Removes redundant or overlapping conditions, but can
317+
take quite some time even for a low number of nested conditions.
318+
- `"none"`: search the index without optimizing the conditions.
319+
<!-- Internal only: nodnf, noneg -->
320+
321+
See [Optimizing View and inverted index query performance](../../index-and-search/arangosearch/performance.md#condition-optimization-options)
322+
for an example.
323+
324+
### `countApproximate`
325+
326+
This option controls how the total count of rows is calculated if the `fullCount`
327+
option is enabled for a query or when a `COLLECT WITH COUNT` clause is executed.
328+
You can set it to one of the following values:
329+
330+
- `"exact"` (default): rows are actually enumerated for a precise count.
331+
- `"cost"`: a cost-based approximation is used. Does not enumerate rows and
332+
returns an approximate result with O(1) complexity. Gives a precise result
333+
if the `SEARCH` condition is empty or if it contains a single term query
334+
only (e.g. `SEARCH doc.field == "value"`), the usual eventual consistency
335+
of Views aside.
336+
337+
See [Optimizing View and inverted index query performance](../../index-and-search/arangosearch/performance.md#count-approximation)
338+
for an example.
339+
340+
### `parallelism`
341+
342+
A `SEARCH` operation can optionally process index segments in parallel using
343+
multiple threads. This can speed up search queries but increases CPU and memory
344+
utilization.
345+
346+
If you omit the `parallelism` option, then the default parallelism as defined by
347+
the [`--arangosearch.default-parallelism` startup option](../../components/arangodb-server/options.md#--arangosearchdefault-parallelism)
348+
is used. If you set it to a value of `1`, the search execution is not
349+
parallelized. If the value is greater than `1`, then up to that many worker
350+
threads can be used for concurrently processing index segments. The maximum
351+
number of total parallel execution threads is defined by the
352+
[`--arangosearch.execution-threads-limit` startup option](../../components/arangodb-server/options.md#--arangosearchexecution-threads-limit)
353+
that defaults to twice the number of CPU cores.
354+
355+
The `parallelism` option should be considered a hint. Not all search queries are
356+
eligible. Queries also don't wait for the specified number of threads to be
357+
available. They start immediately even if only single-threaded and may acquire
358+
more threads later.
359+
360+
```aql
361+
FOR doc IN restaurantsView
362+
SEARCH ANALYZER(GEO_INTERSECTS(rect, doc.geometry), "geojson")
363+
OPTIONS { parallelism: 16 }
364+
RETURN doc.geometry
365+
```

site/content/3.12/index-and-search/arangosearch/performance.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -675,3 +675,13 @@ db._createView("articlesView", "search-alias", { indexes: [
675675
{ collection: "articles", index: "inv-idx" }
676676
] });
677677
```
678+
679+
## Parallel index segment processing
680+
681+
<small>Introduced in: v3.12.0</small>
682+
683+
You can speed up `SEARCH` queries against Views using the `parallelism` option
684+
to process index segment using multiple threads.
685+
686+
See [`SEARCH` operation in AQL](../../aql/high-level-operations/search.md#parallelism)
687+
for details.

site/content/3.12/release-notes/version-3.12/api-changes-in-3-12.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,13 +142,14 @@ produced no warnings.
142142

143143
#### Metrics API
144144

145-
The metrics endpoint includes the following new metrics about AQL queries and
146-
ongoing dumps:
145+
The metrics endpoint includes the following new metrics about AQL queries,
146+
ongoing dumps, and ArangoSearch execution threads:
147147

148148
- `arangodb_aql_cursors_active`
149149
- `arangodb_dump_memory_usage`
150150
- `arangodb_dump_ongoing`
151151
- `arangodb_dump_threads_blocked_total`
152+
- `arangodb_search_execution_threads_demand`
152153

153154
---
154155

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,26 @@ for examples.
3131

3232
This feature is only available in the Enterprise Edition.
3333

34+
### `SEARCH` parallelization
35+
36+
In search queries against Views, you can set the new `parallelism` option for
37+
`SEARCH` operations to optionally process index segments in parallel using
38+
multiple threads. This can speed up search queries.
39+
40+
The default value for the `parallelism` option is defined by the new
41+
`--arangosearch.default-parallelism` startup option that defaults to `1`.
42+
43+
The new `--arangosearch.execution-threads-limit` startup option controls how
44+
many threads can be used in total for search queries. The new
45+
`arangodb_search_execution_threads_demand` metric reports the number of threads
46+
that queries request. If it is below the configured thread limit, it coincides
47+
with the number of active threads. If it exceeds the limit, some queries cannot
48+
currently get the threads as requested and may have to use a single thread until
49+
more become available.
50+
51+
See [`SEARCH` operation in AQL](../../aql/high-level-operations/search.md#parallelism)
52+
for details.
53+
3454
## Analyzers
3555

3656

0 commit comments

Comments
 (0)