Skip to content

Commit a6374de

Browse files
author
Christian Hergert
committed
doc: bring in aggregation pipeline examples.
1 parent 6109399 commit a6374de

File tree

2 files changed

+154
-0
lines changed

2 files changed

+154
-0
lines changed

doc/aggregate.page

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
<page xmlns="http://projectmallard.org/1.0/"
2+
type="topic"
3+
id="aggregation">
4+
<info><link xref="index#aggregation" type="guide"/></info>
5+
<title>Aggregation Framework Examples</title>
6+
7+
<p>This document provides a number of practical examples that display the capabilities of the aggregation framework.</p>
8+
9+
<p>The <link href="http://docs.mongodb.org/manual/tutorial/aggregation-examples/#aggregations-using-the-zip-code-data-set">Aggregations using the Zip Codes Data Set</link> examples uses a publicly available data set of all zipcodes and populations in the United States. These data are available at: <link href="http://media.mongodb.org/zips.json">zips.json</link>.</p>
10+
11+
<section id="requirements">
12+
<title>Requirements</title>
13+
14+
<p><link href="https://mongodb.org">MongoDB</link>, version 2.2.0 or later. <link href="https://github.com/mongodb/mongo-c-driver">MongoDB C driver</link>, version 0.94.4 or later.</p>
15+
<p>Let's check if everything is installed.</p>
16+
<p>Use the following command to load zips.json data set into mongod instance:</p>
17+
18+
<screen><input style="prompt">$ </input><input>mongoimport --drop -d test -c zipcodes zips.json</input></screen>
19+
20+
<p>Let's use the MongoDB shell to verify that everything was imported successfully.</p>
21+
22+
<screen><input style="prompt">$ </input><input>mongo test</input>
23+
<output>MongoDB shell version: 2.6.1
24+
connecting to: test</output>
25+
<input style="prompt">&gt; </input><input>db.zipcodes.count()</input>
26+
<output>29467</output>
27+
<input style="prompt">&gt; </input><input>db.zipcodes.findOne()</input>
28+
<output><![CDATA[{
29+
"_id" : "35004",
30+
"city" : "ACMAR",
31+
"loc" : [
32+
-86.51557,
33+
33.584132
34+
],
35+
"pop" : 6055,
36+
"state" : "AL"
37+
}]]></output></screen>
38+
</section>
39+
40+
<section>
41+
<title>Aggregations using the Zip Codes Data Set</title>
42+
<p>Each document in this collection has the following form:</p>
43+
<synopsis><code mime="text/x-json"><![CDATA[{
44+
"_id" : "35004",
45+
"city" : "Acmar",
46+
"state" : "AL",
47+
"pop" : 6055,
48+
"loc" : [-86.51557, 33.584132]
49+
}]]></code></synopsis>
50+
51+
<p>In these documents:</p>
52+
53+
<list>
54+
<item><p>The <code>_id</code> field holds the zipcode as a string.</p></item>
55+
<item><p>The <code>city</code> field holds the city name.</p></item>
56+
<item><p>The <code>state</code> field holds the two letter state abbreviation.</p></item>
57+
<item><p>The <code>pop</code> field holds the population.</p></item>
58+
<item><p>The <code>loc</code> field holds the location as a <code>[latitude, longitude]</code> array.</p></item>
59+
</list>
60+
</section>
61+
62+
<section>
63+
<title>States with Populations Over 10 Million</title>
64+
<p>To get all states with a population greater than 10 million, use the following aggregation pipeline:</p>
65+
<synopsis><code mime="text/x-csrc"><![CDATA[#include <mongoc.h>
66+
#include <bcon.h>
67+
#include <stdio.h>
68+
69+
static void
70+
print_pipeline (mongoc_collection_t *collection)
71+
{
72+
bson_t *pipeline;
73+
mongoc_cursor_t *cursor;
74+
const bson_t *doc;
75+
76+
pipeline = BCON_NEW ("pipeline", "[",
77+
"{", "$group", "{", "_id", "$state", "total_pop", "{", "$sum", "$pop", "}", "}", "}",
78+
"{", "$match", "{", "total_pop", "{", "$gte", BCON_INT32 (10000000), "}", "}", "}",
79+
"]");
80+
81+
cursor = mongoc_collection_aggregate (collection, MONGOC_QUERY_NONE, pipeline, NULL, NULL);
82+
83+
while (mongoc_cursor_next (cursor, &doc)) {
84+
char *str;
85+
86+
str = bson_as_json (doc, NULL);
87+
printf ("%s\n", str);
88+
bson_free (str);
89+
}
90+
91+
mongoc_cursor_destroy (cursor);
92+
bson_destroy (pipeline);
93+
}]]></code></synopsis>
94+
95+
<p>You should see a result like the following:</p>
96+
97+
<synopsis><code mime="text/x-json"><![CDATA[{ "_id" : "PA", "total_pop" : 11881643 }
98+
{ "_id" : "OH", "total_pop" : 10847115 }
99+
{ "_id" : "NY", "total_pop" : 17990455 }
100+
{ "_id" : "FL", "total_pop" : 12937284 }
101+
{ "_id" : "TX", "total_pop" : 16986510 }
102+
{ "_id" : "IL", "total_pop" : 11430472 }
103+
{ "_id" : "CA", "total_pop" : 29760021 }]]></code></synopsis>
104+
105+
<p>The above aggregation pipeline is build from two pipeline operators: <code>$group</code> and <code>$match</code>.</p>
106+
107+
<p>The <code>$group</code> pipeline operator requires _id field where we specify grouping; remaining fields specify how to generate composite value and must use one of the group aggregation functions: <code>$addToSet</code>, <code>$first</code>, <code>$last</code>, <code>$max</code>, <code>$min</code>, <code>$avg</code>, <code>$push</code>, <code>$sum</code>. The <code>$match</code> pipeline operator syntax is the same as the read operation query syntax.</p>
108+
109+
<p>The <code>$group</code> process reads all documents and for each state it creates a separate document, for example:</p>
110+
111+
<synopsis><code mime="text/x-json">{ "_id" : "WA", "total_pop" : 4866692 }</code></synopsis>
112+
113+
<p>The <code>total_pop</code> field uses the $sum aggregation function to sum the values of all pop fields in the source documents.</p>
114+
<p>Documents created by <code>$group</code> are piped to the <code>$match</code> pipeline operator. It returns the documents with the value of <code>total_pop</code> field greater than or equal to 10 million.</p>
115+
116+
</section>
117+
118+
<section>
119+
<title>Average City Population by State</title>
120+
<p>To get the first three states with the greatest average population per city, use the following aggregation:</p>
121+
122+
<synopsis><code mime="text/x-csrc"><![CDATA[pipeline = BCON_NEW ("pipeline", "[",
123+
"{", "$group", "{", "_id", "{", "state", "$state", "city", "$city", "}", "pop", "{", "$sum", "$pop", "}", "}", "}",
124+
"{", "$group", "{", "_id", "$_id.state", "avg_city_pop", "{", "$avg", "$pop", "}", "}", "}",
125+
"{", "$sort", "{", "avg_city_pop", BCON_INT32 (-1), "}", "}",
126+
"{", "$limit", BCON_INT32 (3) "}",
127+
"]");]]></code></synopsis>
128+
129+
<p>This aggregate pipeline produces:</p>
130+
131+
<synopsis><code mime="text/x-json"><![CDATA[{ "_id" : "DC", "avg_city_pop" : 303450.0 }
132+
{ "_id" : "FL", "avg_city_pop" : 27942.29805615551 }
133+
{ "_id" : "CA", "avg_city_pop" : 27735.341099720412 }]]></code></synopsis>
134+
135+
<p>The above aggregation pipeline is build from three pipeline operators: <code>$group</code>, <code>$sort</code> and <code>$limit</code>.</p>
136+
137+
<p>The first <code>$group</code> operator creates the following documents:</p>
138+
139+
<synopsis><code mime="text/x-json"><![CDATA[{ "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 }]]></code></synopsis>
140+
141+
<p>Note, that the <code>$group</code> operator can't use nested documents except the <code>_id</code> field.</p>
142+
143+
<p>The second <code>$group</code> uses these documents to create the following documents:</p>
144+
145+
<synopsis><code mime="text/x-json"><![CDATA[{ "_id" : "FL", "avg_city_pop" : 27942.29805615551 }]]></code></synopsis>
146+
147+
<p>These documents are sorted by the <code>avg_city_pop</code> field in descending order. Finally, the <code>$limit</code> pipeline operator returns the first 3 documents from the sorted set.</p>
148+
</section>
149+
150+
</page>

doc/index.page

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@
2323
<title>Cursors</title>
2424
</section>
2525

26+
<section id="aggregation" style="2column">
27+
<title>Aggregation Framework</title>
28+
</section>
29+
2630
<section id="matching" style="2column">
2731
<title>Client Side Document Matching</title>
2832
</section>

0 commit comments

Comments
 (0)