Skip to content

Commit 35f8359

Browse files
author
Justin Lee
committed
documentation for the new pipeline stages
1 parent ebc67bc commit 35f8359

File tree

1 file changed

+182
-0
lines changed

1 file changed

+182
-0
lines changed

docs/reference/content/builders/aggregation.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,188 @@ This example writes the pipeline to the `authors` collection:
167167
out("authors")
168168
```
169169

170+
### GraphLookup
171+
172+
The [`$graphLookup`]({{< docsref "reference/operator/aggregation/graphLookup/" >}}) pipeline stage performs a recursive search on a specified collection to match field A of one document to some field B of the other documents. For the matching documents, the stage repeats the search to match field A from the matching documents to the field B of the remaining documents until no new documents are encountered or until a specified depth. To each output document, `$graphLookup` adds a new array field that contains the traversal results of the search for that document.
173+
174+
The following example computes the social network graph for users in the `contacts` collection, recursively matching the value in the `friends` field to the `name` field, up to recursive depth of 1.
175+
176+
```java
177+
graphLookup("contacts", "$friends", "friends", "name", "socialNetwork",
178+
new GraphLookupOptions().maxDepth(1))
179+
```
180+
181+
Using `GraphLookupOptions`, the output can be tailored to restrict the depth of the recursion as well to inject a field containing the depth of the recursion at which a document was included.
182+
183+
### SortByCount
184+
185+
The [`$sortByCount`]({{< docsref "reference/operator/aggregation/sortByCount/" >}}) stage groups documents by a given expression and then sorts these groups by count in descending order. The `sortByCount` outputs documents that contains an `_id` field, which contains the discrete values of the given expression, and the `count` field that contains the number of documents that fall into that group.
186+
187+
The following example groups documents by the truncated value of the field `x` and computes the count for each distinct value of `x`.
188+
189+
```java
190+
sortByCount(new Document("$floor", "$x"))
191+
```
192+
193+
### ReplaceRoot
194+
195+
The [`$replaceRoot`]({{< docsref "reference/operator/aggregation/replaceRoot/" >}}) pipeline stage replaces each input document to the stage with the specified document. All existing fields, including the `_id` field, are replaced.
196+
197+
If each input document to the `replaceRoot` stage has a field `a1` that contains a field `b` whose value is a document, the following operation replaces each input document with the document in the `b` field.
198+
199+
```java
200+
replaceRoot("$a1.b")
201+
```
202+
203+
### AddFields
204+
205+
The [`$addFields`]({{< docsref "reference/operator/aggregation/addFields/" >}}) pipeline stage adds new fields to documents. The stage outputs documents that contain all existing fields from the input documents and the newly added fields.
206+
207+
This example adds two new fields, `myNewField` and `z` to the input documents; `myNewField` has the value `{c: 3, d: 4}`, `z` has the value 5.
208+
209+
```java
210+
addFields(new Field("myNewField", new Document("c", 3).append("d", 4)),
211+
new Field("z", 5))
212+
```
213+
214+
These new fields do not need be statically defined. The following example shows how to add a new field which is a function of the current document's values. In this case, a new field `alt3` is added with a value of `true` if the current value of the field `a` is less than 3. Otherwise, `alt3` will be `false` in the new field.
215+
216+
```java
217+
addFields(new Field("alt3", new Document("$lt", asList("$a", 3))))
218+
```
219+
220+
### Count
221+
222+
The [`$count`]({{< docsref "reference/operator/aggregation/count/" >}}) pipeline stage specifies the name of the field that will contain the number of documents that enter this stage. The `$count` stage is syntactic sugar for: `{$group:{_id:null, count:{$sum:1}}}`
223+
224+
There are two ways to invoke this stage. The first way is to explicitly name the resulting field as in the two following examples:
225+
226+
```java
227+
count("count")
228+
```
229+
230+
```java
231+
count("total")
232+
```
233+
234+
These two invocations will put the count in the `count` and `total` fields respectively. If `count` is the field name to be used, this can be shortened with the following convenience method:
235+
236+
```java
237+
count()
238+
```
239+
240+
This invocation defaults the field name to `count`.
241+
242+
243+
### Bucket
244+
245+
The [`$bucket`]({{< docsref "reference/operator/aggregation/bucket/" >}}) pipeline stage automates the bucketing of data around predefined boundary values.
246+
247+
The following example shows a basic `$bucket` stage:
248+
249+
```java
250+
bucket("$screenSize", [0, 24, 32, 50, 70, 200])
251+
```
252+
253+
This will result in output that looks like this:
254+
255+
```json
256+
[_id:0, count:1]
257+
[_id:24, count:2]
258+
[_id:32, count:1]
259+
[_id:50, count:1]
260+
[_id:70, count:2]
261+
```
262+
263+
The default output is simply the lower bound as the `_id` and a single field containing the size of that bucket. This output can be modified using the `BucketOptions` class. The above example can be expanded to look like this:
264+
265+
```java
266+
bucket("$screenSize", [0, 24, 32, 50, 70], new BucketOptions()
267+
.defaultBucket("monster")
268+
.output(sum("count", 1), push("matches", "$screenSize")))
269+
```
270+
271+
The optional value `defaultBucket` defines the name of the bucket for values that fall outside defined bucket boundaries. If `defaultBucket` is undefined and values exist outside of the defined bucket boundaries, the stage will produce an error. The other value is the `output` field which defines the shape of the document output for each bucket. The output of this stage looks something like this:
272+
273+
```json
274+
[[_id: 0, count: 1, matches: [22]],
275+
[_id: 24, count: 2, matches: [24, 30]],
276+
[_id: 32, count: 1, matches: [42]],
277+
[_id: 50, count: 1, matches: [55]],
278+
[_id: monster, count: 2, matches: [75, 155]]]
279+
```
280+
281+
This output contains not only the size of the bucket but also the values in the bucket. Notice the enormous screen sizes are found in the synthetic bucket named `monster` reflecting the outrageously large screen sizes.
282+
283+
### BucketAuto
284+
285+
The [`$bucketAuto`]({{< docsref "reference/operator/aggregation/bucketAuto/" >}}) pipeline stage automatically determines the boundaries of each bucket in its attempt to distribute the documents evenly into a specified number of buckets. Depending on the input documents, the number of buckets may be less than the specified number of buckets.
286+
287+
For example, this stage creates 10 buckets:
288+
289+
```java
290+
bucketAuto("$price", 10)
291+
```
292+
293+
This results in output that looks something like this:
294+
295+
```json
296+
[[_id: [min: 2, max: 30], count: 14],
297+
[_id: [min: 30, max: 58], count: 14],
298+
[_id: [min: 58, max: 86], count: 14],
299+
[_id: [min: 86, max: 114], count: 14],
300+
[_id: [min: 114, max: 142], count: 14],
301+
[_id: [min: 142, max: 170], count: 14],
302+
[_id: [min: 170, max: 198], count: 14],
303+
[_id: [min: 198, max: 226], count: 14],
304+
[_id: [min: 226, max: 254], count: 14],
305+
[_id: [min: 254, max: 274], count: 11]]
306+
```
307+
308+
Note the uniformity of bucket sizes except for the last bucket. For a more precise scheme of bucket definition, the `BucketAutoOptions` class exposes the opportunity to use a [preferred number](https://en.wikipedia.org/wiki/Preferred_number) based scheme to determine those boundary values. As with `BucketOptions`, the output document shape can be defined using the `output` value on `BucketAutoOptions`. An example of these options is shown below:
309+
310+
```java
311+
bucketAuto("$price", 10, new BucketAutoOptions()
312+
.granularity(BucketGranularity.POWERSOF2)
313+
.output(sum("count", 1), avg("avgPrice", "$price")))
314+
```
315+
316+
### Facet
317+
318+
The [`$facet`]({{< docsref "reference/operator/aggregation/facet/" >}}) pipeline stage allows for the definition of a faceted search. The stage is defined with a set of names and nested aggregation pipelines which define each particular facet. For example, to return to the example of the television screen size search, the following `$facet` will return a document that groups televisions by size and manufacturer:
319+
320+
```java
321+
facet(
322+
new Facet("Screen Sizes",
323+
unwind("$attributes"),
324+
bucketAuto("$attributes.screen_size", 5, new BucketAutoOptions()
325+
.output(sum("count", 1)))),
326+
new Facet("Manufacturer",
327+
sortByCount("$attributes.manufacturer"),
328+
limit(5))
329+
)
330+
```
331+
332+
This stage returns a document that looks like this:
333+
334+
```json
335+
{
336+
"Manufacturer": [
337+
{"_id": "Vizio", "count": 17},
338+
{"_id": "Samsung", "count": 17},
339+
{"_id": "Sony", "count": 17}
340+
],
341+
"Screen Sizes": [
342+
{"_id": {"min": 35, "max": 45}, "count": 10},
343+
{"_id": {"min": 45, "max": 55}, "count": 10},
344+
{"_id": {"min": 55, "max": 65}, "count": 10},
345+
{"_id": {"min": 65, "max": 75}, "count": 10},
346+
{"_id": {"min": 75, "max": 85}, "count": 11}
347+
]
348+
}
349+
350+
```
351+
170352
### Creating a Pipeline
171353

172354
The above pipeline operators are typically combined into a list and passed to the `aggregate` method of a `MongoCollection`. For instance:

0 commit comments

Comments
 (0)