|
| 1 | +:man_page: mongoc_aggregate |
| 2 | + |
| 3 | +Aggregation Framework Examples |
| 4 | +============================== |
| 5 | + |
| 6 | +This document provides a number of practical examples that display the capabilities of the aggregation framework. |
| 7 | + |
| 8 | +The `Aggregations using the Zip Codes Data Set <https://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/>`_ examples uses a publicly available data set of all zipcodes and populations in the United States. These data are available at: `zips.json <http://media.mongodb.org/zips.json>`_. |
| 9 | + |
| 10 | +Requirements |
| 11 | +------------ |
| 12 | + |
| 13 | +`MongoDB <https://www.mongodb.org>`_, version 2.2.0 or later. `MongoDB C driver <https://github.com/mongodb/mongo-c-driver>`_, version 0.96.0 or later. |
| 14 | + |
| 15 | +Let's check if everything is installed. |
| 16 | + |
| 17 | +Use the following command to load zips.json data set into mongod instance: |
| 18 | + |
| 19 | +.. code-block:: none |
| 20 | +
|
| 21 | + $ mongoimport --drop -d test -c zipcodes zips.json |
| 22 | +
|
| 23 | +Let's use the MongoDB shell to verify that everything was imported successfully. |
| 24 | + |
| 25 | +.. code-block:: none |
| 26 | +
|
| 27 | + $ mongo testMongoDB shell version: 2.6.1 |
| 28 | + connecting to: test> db.zipcodes.count()29467> db.zipcodes.findOne(){ |
| 29 | + "_id" : "35004", |
| 30 | + "city" : "ACMAR", |
| 31 | + "loc" : [ |
| 32 | + -86.51557, |
| 33 | + 33.584132 |
| 34 | + ], |
| 35 | + "pop" : 6055, |
| 36 | + "state" : "AL" |
| 37 | + } |
| 38 | +
|
| 39 | +Aggregations using the Zip Codes Data Set |
| 40 | +----------------------------------------- |
| 41 | + |
| 42 | +Each document in this collection has the following form: |
| 43 | + |
| 44 | +.. code-block:: none |
| 45 | +
|
| 46 | + { |
| 47 | + "_id" : "35004", |
| 48 | + "city" : "Acmar", |
| 49 | + "state" : "AL", |
| 50 | + "pop" : 6055, |
| 51 | + "loc" : [-86.51557, 33.584132] |
| 52 | + } |
| 53 | +
|
| 54 | +In these documents: |
| 55 | + |
| 56 | +* The ``_id`` field holds the zipcode as a string. |
| 57 | +* The ``city`` field holds the city name. |
| 58 | +* The ``state`` field holds the two letter state abbreviation. |
| 59 | +* The ``pop`` field holds the population. |
| 60 | +* The ``loc`` field holds the location as a ``[latitude, longitude]`` array. |
| 61 | + |
| 62 | +States with Populations Over 10 Million |
| 63 | +--------------------------------------- |
| 64 | + |
| 65 | +To get all states with a population greater than 10 million, use the following aggregation pipeline: |
| 66 | + |
| 67 | +.. literalinclude:: ../examples/aggregation/aggregation1.c |
| 68 | + :language: c |
| 69 | + :caption: aggregation1.c |
| 70 | + |
| 71 | +You should see a result like the following: |
| 72 | + |
| 73 | +.. code-block:: none |
| 74 | +
|
| 75 | + { "_id" : "PA", "total_pop" : 11881643 } |
| 76 | + { "_id" : "OH", "total_pop" : 10847115 } |
| 77 | + { "_id" : "NY", "total_pop" : 17990455 } |
| 78 | + { "_id" : "FL", "total_pop" : 12937284 } |
| 79 | + { "_id" : "TX", "total_pop" : 16986510 } |
| 80 | + { "_id" : "IL", "total_pop" : 11430472 } |
| 81 | + { "_id" : "CA", "total_pop" : 29760021 } |
| 82 | +
|
| 83 | +The above aggregation pipeline is build from two pipeline operators: ``$group`` and ``$match``. |
| 84 | + |
| 85 | +The ``$group`` pipeline operator requires _id field where we specify grouping; remaining fields specify how to generate composite value and must use one of the group aggregation functions: ``$addToSet``, ``$first``, ``$last``, ``$max``, ``$min``, ``$avg``, ``$push``, ``$sum``. The ``$match`` pipeline operator syntax is the same as the read operation query syntax. |
| 86 | + |
| 87 | +The ``$group`` process reads all documents and for each state it creates a separate document, for example: |
| 88 | + |
| 89 | +.. code-block:: none |
| 90 | +
|
| 91 | + { "_id" : "WA", "total_pop" : 4866692 } |
| 92 | +
|
| 93 | +The ``total_pop`` field uses the $sum aggregation function to sum the values of all pop fields in the source documents. |
| 94 | + |
| 95 | +Documents created by ``$group`` are piped to the ``$match`` pipeline operator. It returns the documents with the value of ``total_pop`` field greater than or equal to 10 million. |
| 96 | + |
| 97 | +Average City Population by State |
| 98 | +-------------------------------- |
| 99 | + |
| 100 | +To get the first three states with the greatest average population per city, use the following aggregation: |
| 101 | + |
| 102 | +.. code-block:: none |
| 103 | +
|
| 104 | + pipeline = BCON_NEW ("pipeline", "[", |
| 105 | + "{", "$group", "{", "_id", "{", "state", "$state", "city", "$city", "}", "pop", "{", "$sum", "$pop", "}", "}", "}", |
| 106 | + "{", "$group", "{", "_id", "$_id.state", "avg_city_pop", "{", "$avg", "$pop", "}", "}", "}", |
| 107 | + "{", "$sort", "{", "avg_city_pop", BCON_INT32 (-1), "}", "}", |
| 108 | + "{", "$limit", BCON_INT32 (3) "}", |
| 109 | + "]"); |
| 110 | +
|
| 111 | +This aggregate pipeline produces: |
| 112 | + |
| 113 | +.. code-block:: none |
| 114 | +
|
| 115 | + { "_id" : "DC", "avg_city_pop" : 303450.0 } |
| 116 | + { "_id" : "FL", "avg_city_pop" : 27942.29805615551 } |
| 117 | + { "_id" : "CA", "avg_city_pop" : 27735.341099720412 } |
| 118 | +
|
| 119 | +The above aggregation pipeline is build from three pipeline operators: ``$group``, ``$sort`` and ``$limit``. |
| 120 | + |
| 121 | +The first ``$group`` operator creates the following documents: |
| 122 | + |
| 123 | +.. code-block:: none |
| 124 | +
|
| 125 | + { "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 } |
| 126 | +
|
| 127 | +Note, that the ``$group`` operator can't use nested documents except the ``_id`` field. |
| 128 | + |
| 129 | +The second ``$group`` uses these documents to create the following documents: |
| 130 | + |
| 131 | +.. code-block:: none |
| 132 | +
|
| 133 | + { "_id" : "FL", "avg_city_pop" : 27942.29805615551 } |
| 134 | +
|
| 135 | +These documents are sorted by the ``avg_city_pop`` field in descending order. Finally, the ``$limit`` pipeline operator returns the first 3 documents from the sorted set. |
| 136 | + |
0 commit comments