33The aggregation framework is a powerful tool in your MongoDB toolbox. It allows
44you to run complex queries on your data, shaping and modifying documents to suit
55your needs. This power comes through a lot of different pipeline stages and
6- operators, which comes with a certain learning challenge. MongoDB Compass comes
7- with an aggregation pipeline builder that allows you to see results in real-time
6+ operators, which in turn brings a certain learning challenge. MongoDB Compass
7+ includes an aggregation pipeline builder that allows you to see results in real-time
88for each stage and fix mistakes early on. Once your pipeline is complete, you
99can export the pipeline to your language and use it in your code. In the PHP
10- driver, from now on your pipeline lives as an array, completely untyped, and
11- sometimes a relatively complex structure of stages and operators. As an example,
12- let's take this pipeline from one of my projects:
10+ driver, that pipeline would live on as an array, completely untyped, and
11+ sometimes with a relatively complex structure of stages and operators. Let's
12+ take this pipeline from one of my projects as an example :
1313
1414``` php
1515$pipeline = [
@@ -78,7 +78,7 @@ $pipeline = [
7878];
7979```
8080
81- Phew, that 's a lot of logic. To better understand what this pipeline does, let's
81+ That 's a lot of logic! To better understand what this pipeline does, let's
8282look at a single source document:
8383
8484``` json
@@ -93,7 +93,7 @@ look at a single source document:
9393```
9494
9595I've left out some fields that we're not using right now. The aggregation
96- pipeline aggregates all of these documents, producing a document for each day :
96+ pipeline aggregates all of these documents, producing a document for each month :
9797
9898``` json
9999{
@@ -122,17 +122,17 @@ Without going into more details on this, even if we were to comment on parts of
122122the aggregation pipeline to explain what it does, there will still be a high
123123cognitive load when going through the aggregation pipeline. One reason for this
124124is that any PHP editor will not know that this is an aggregation pipeline, and
125- thus can't provide any better syntax highlighting other than "this is a string
126- in an array". Couple that with a few levels of nesting, and you've got yourself
125+ thus can't provide much help beyond syntax highlighting (e.g. "this is a string
126+ in an array") . Couple that with a few levels of nesting, and you've got yourself
127127this magical kind of code that you can write, but not read. We can of course
128128refactor this code, but before we get into that, we want to move away from these
129129array structures.
130130
131131## Introducing the Aggregation Pipeline Builder
132132
133- Previously released as a standalone package, version 1.21 of the MongoDB Driver
134- for PHP now comes with a fully grown aggregation pipeline builder. Instead of
135- writing complex arrays, you now get factory methods to generate pipeline stages
133+ Previously released as a standalone package, version 1.21 of the MongoDB PHP
134+ driver now includes a comprehensive aggregation pipeline builder. Instead of
135+ writing complex arrays, you can use factory methods to generate pipeline stages
136136and operators. Here is that same pipeline as we had before, this time written
137137with the aggregation pipeline builder:
138138
@@ -199,9 +199,10 @@ $pipeline = new Pipeline(
199199);
200200```
201201
202- Ok, this is still a complex pipeline, and we'll be working on this, but it now
203- becomes significantly easier to look at and differentiate operators from field
204- names, etc.
202+ This is still a complex pipeline, but compared to the original array example
203+ it is now much easier to infer the context of each pipeline component. Operators
204+ are clearly differentiated from field names, and this typing can enable code
205+ editors and tooling to better assist the developer.
205206
206207To run an aggregation pipeline, you can pass a ` Pipeline ` instance to any method
207208that can receive an aggregation pipeline, such as ` Collection::aggregate ` or
@@ -217,24 +218,23 @@ to represent the somewhat flexible type system and give better guidance to users
217218when writing aggregation pipelines. That's why you will see expressions like
218219` dateFieldPath ` , ` doubleFieldPath ` , or ` arrayFieldPath ` . Each expression
219220resolves to a certain type when it's evaluated. For example, we know that the
220- ` $year ` operator expression resolves to an integer. The argument is an
221+ ` $year ` operator expression resolves to an integer, and its argument is an
221222expression that resolves to a date, timestamp, or ObjectId. While we could use
222- ` $reportDate ` to use the ` reportDate ` field from the document being evaluated,
223+ ` $reportDate ` to reference the ` reportDate ` field of the document being evaluated,
223224` dateFieldPath ` is more expressive and shows intent of receiving a date field.
224225This also allows IDEs like PhpStorm to make better suggestions when offering
225226code completion.
226227
227228For all expressions, there are factory classes with methods to create the
228229expression objects. The use of static methods makes the code a little more
229- verbose, but using functions was impossible due to aggregation pipeline using
230- operator names that are reserved keywords in PHP (such as ` and ` , ` if ` , and
231- ` switch ` ). I'll show alternatives to using these static methods later in this
232- blog post.
230+ verbose, but using functions was impossible due to conflicts between aggregation
231+ operator names and reserved keywords in PHP (e.g. ` and ` , ` if ` , ` switch ` ). I'll
232+ show alternatives to using these static methods later in this blog post.
233233
234234## Bonus Feature: Query Objects
235235
236236As a side effect of building the aggregation pipeline builder, there's now also
237- a builder for query objects . This is because the ` $match ` stage takes a query
237+ a builder for query filters . This is because the ` $match ` stage takes a query
238238object, and to avoid falling back to query arrays like you would pass them to
239239` Collection::find ` , we also built a builder for query objects. Here you see an
240240example of a ` find ` call, along with the same query specified using the builder:
@@ -253,7 +253,8 @@ $collection->find(
253253```
254254
255255While this is a little more verbose, it provides a more expressive API than PHP
256- array structures do. It's up to you to decide which option you like better.
256+ array structures and brings the same improvements for IDEs and tooling. It's
257+ up to you to decide which option you like better.
257258
258259## Refactoring For Better Maintainability
259260
@@ -262,10 +263,10 @@ array structures do. It's up to you to decide which option you like better.
262263With the basic builder details explained, there's still one problem: the builder
263264helps you write a pipeline, but it doesn't really make existing pipelines more
264265maintainable. Yes, it makes them easier to read, but a complex pipeline will
265- remain just as complex. So, let's discuss some refactorings we can make to make
266- the aggregation pipeline easier to read, but also to make parts of the pipeline
267- reusable. Note that all of these example apply the same way to pipelines written
268- as PHP arrays, but I'll use the aggregation builder in the example .
266+ remain just as complex. So, let's discuss some refactorings we can make to both
267+ improve the pipeline's readability and make parts of the pipeline more reusable.
268+ Note that although the following example uses the aggregation builder, the same
269+ suggestions can also be applied to pipelines written as PHP arrays .
269270
270271Let's look at the first ` $group ` stage in the original example:
271272
@@ -284,8 +285,8 @@ Stage::group(
284285);
285286```
286287
287- As you can see, we use the ` reportDate ` and ` price ` fields multiple times. An
288- obvious refactoring would be to extract a variable for this :
288+ As you can see, we reference the ` reportDate ` and ` price ` fields multiple times.
289+ A quick refactoring would be to extract those to variables :
289290
290291``` php
291292$reportDate = Expression::dateFieldPath('reportDate');
@@ -305,9 +306,8 @@ Stage::group(
305306);
306307```
307308
308- The ` fuelType ` and ` station.brand ` fields could be extracted as well. Since
309- these are only used once, I didn't do that, but you may want to do so to favour
310- consistency.
309+ The ` fuelType ` and ` station.brand ` fields could be extracted as well. I opted not
310+ to since they are only used once, but you may want to do so in favor of consistency.
311311
312312### Comments Or Methods
313313
@@ -361,8 +361,8 @@ fuel types with their prices, which is then converted to an object in
361361` $addFields ` . Ideally, we want to hide this implementation detail and extract
362362both stages together.
363363
364- To do so, we once again extract a factory method, except that this time we'll be
365- returning a ` Pipeline ` instance:
364+ To do so, we once again extract a factory method, except that this time we'll
365+ return a ` Pipeline ` instance:
366366
367367``` php
368368public static function groupAndAssembleFuelTypePriceObject(
@@ -401,7 +401,7 @@ public static function groupAndAssembleFuelTypePriceObject(
401401}
402402```
403403
404- By once again keeping fields as parameters, we keep the method flexible and
404+ By once again keeping fields as parameters, the method remains flexible and we
405405allow using it in a pipeline that produces slightly different documents up to
406406this point. Since the method works independently of how we group documents, we
407407also keep the identifier as a parameter. Using this method further simplifies
@@ -468,7 +468,7 @@ $pipeline = new Pipeline(
468468
469469So far, we've only extracted entire pipeline stages that contain relatively
470470simple expressions. Sometimes your aggregation pipeline will contain a more
471- complex expression. From the same project that I took the previous example from ,
471+ complex expression. From the same project that yielded the previous example,
472472there's also this gem that is part of a pipeline that computes the weighted
473473average price for each day:
474474
@@ -519,8 +519,7 @@ $pipeline = [
519519];
520520```
521521
522- Once again, the builder can make this a little more concise, but the complexity
523- remains:
522+ The builder can make this a little more concise, but the complexity remains:
524523
525524``` php
526525$prices = Expression::arrayFieldPath('prices');
@@ -603,7 +602,7 @@ public static function computeDurationBetweenDates(
603602}
604603```
605604
606- Again, this reduces the complexity of the pipeline stage tremendously:
605+ This reduces the complexity of the pipeline stage tremendously:
607606
608607``` php
609608$prices = Expression::arrayFieldPath('prices');
@@ -667,7 +666,7 @@ returns a date, e.g. `$dateFromString`.
667666Now that we know about these value holder objects, we still need to make sure
668667the server knows what we're talking about. When you call ` Collection::aggregate `
669668with the pipeline you built, what happens internally to it? Here, a series of
670- encoders springs into action. We use a single entry point, the ` BuilderEncoder `
669+ encoders spring into action. We use a single entry point, the ` BuilderEncoder `
671670class. This class contains multiple encoders that are able to handle all
672671pipeline stages, operators, and accumulators and transform them into their BSON
673672representations.
@@ -682,24 +681,24 @@ accordingly.
682681When creating a ` MongoDB\Client ` instance, you can now pass an additional
683682` builderEncoder ` option in the ` $driverOptions ` argument. This specifies the
684683encoder used to encode aggregation pipelines, but also query objects. All
685- ` Database ` and ` Client ` instances inherit this value from the client, but you
686- can override it through the options when fetching such an instance . This allows
684+ ` Database ` and ` Collection ` instances inherit this value from the client, but you
685+ can override it through the options when selecting those objects . This allows
687686you to have your custom logic applied whenever pipelines or queries are encoded
688687for the server.
689688
690689With factories, value holders, and encoders, we wanted to ensure that creating
691690the builder does not turn into a repetitive chore. As you can imagine, many
692691operators will mostly consist of the same logic, resulting in tons of code
693- duplication. To make matters worse, every new server version adds some new
694- operators or even stages, so we wanted to make sure that we can easily expand
692+ duplication. To make matters worse, every new server version may introduce new
693+ operators and stages, so we wanted to make sure that we can easily expand
695694the builder.
696695
697696We could try to rely on generative AI to help us with this, but this only goes
698697so far. Instead, we leverage code generation to make the task easier. All
699698factories, value holders, and encoders are generated from a configuration. When
700699a new operator is introduced, we create a config file with all of its details:
701700input types, what the operator resolves to, documentation for parameters, and
702- even the examples from the documentation are included . We then run the
701+ even examples from the MongoDB documentation . We then run the
703702generator, and are given all code necessary to use the operator.
704703
705704As if that wasn't good enough, the generator also takes the examples we added
0 commit comments