|
| 1 | +--- |
| 2 | +title: Getting Started with Aggregation Pipeline |
| 3 | +description: Learn how to get started with Cosmos DB for MongoDB aggregation pipeline for advanced data analysis and manipulation. |
| 4 | +author: gahl-levy |
| 5 | +ms.author: gahllevy |
| 6 | +ms.service: cosmos-db |
| 7 | +ms.subservice: mongodb |
| 8 | +ms.topic: tutorial |
| 9 | +ms.date: 01/24/2023 |
| 10 | +ms.reviewer: mjbrown |
| 11 | +--- |
| 12 | + |
| 13 | +# Getting Started with Aggregation Pipeline |
| 14 | + |
| 15 | +The aggregation pipeline is a powerful tool that allows developers to perform advanced data analysis and manipulation on their collections. The pipeline is a sequence of data processing operations, which are performed on the input documents to produce a computed output. The pipeline stages process the input documents and pass the result to the next stage. Each stage performs a specific operation on the data, such as filtering, grouping, sorting, and transforming. |
| 16 | + |
| 17 | +## Basic Syntax |
| 18 | + |
| 19 | +The basic syntax for an aggregation pipeline is as follows: |
| 20 | + |
| 21 | +```javascript |
| 22 | +db.collection.aggregate([ { stage1 }, { stage2 }, ... { stageN }]) |
| 23 | +``` |
| 24 | + |
| 25 | +Where db.collection is the MongoDB collection you want to perform the aggregation on, and stage1, stage2, ..., stageN are the pipeline stages you want to apply. |
| 26 | + |
| 27 | +## Sample Stages |
| 28 | + |
| 29 | +Cosmos DB for MongoDB provides a wide range of stages that you can use in your pipeline, including: |
| 30 | + |
| 31 | +* $match: Filters the documents to pass only the documents that match the specified condition. |
| 32 | +* $project: Transforms the documents to a new form by adding, removing, or updating fields. |
| 33 | +* $group: Groups documents by one or more fields and performs various aggregate functions on the grouped data. |
| 34 | +* $sort: Sorts the documents based on the specified fields. |
| 35 | +* $skip: Skips the specified number of documents. |
| 36 | +* $limit: Limits the number of documents passed to the next stage. |
| 37 | +* $unwind: Deconstructs an array field from the input documents to output a document for each element. |
| 38 | + |
| 39 | +To view all available stages, see [supported features](feature-support-42.md) |
| 40 | + |
| 41 | +## Examples |
| 42 | + |
| 43 | +Here are some examples of how you can use the aggregation pipeline to perform various operations on your data: |
| 44 | + |
| 45 | +Filtering: To filter documents that have a "quantity" field greater than 20, you can use the following pipeline: |
| 46 | +```javascript |
| 47 | +db.collection.aggregate([ |
| 48 | + { $match: { quantity: { $gt: 20 } } } |
| 49 | +]) |
| 50 | +``` |
| 51 | + |
| 52 | +Grouping: To group documents by the "category" field and calculate the total "quantity" for each group, you can use the following pipeline: |
| 53 | +```javascript |
| 54 | +db.collection.aggregate([ |
| 55 | + { $group: { _id: "$category", totalQuantity: { $sum: "$quantity" } } } |
| 56 | +]) |
| 57 | +``` |
| 58 | + |
| 59 | +Sorting: To sort documents by the "price" field in descending order, you can use the following pipeline: |
| 60 | +```javascript |
| 61 | +db.collection.aggregate([ |
| 62 | + { $sort: { price: -1 } } |
| 63 | +]) |
| 64 | +``` |
| 65 | + |
| 66 | +Transforming: To add a new field "discount" to documents that have a "price" greater than 100, you can use the following pipeline: |
| 67 | + |
| 68 | +```javascript |
| 69 | +db.collection.aggregate([ |
| 70 | + { $project: { item: 1, price: 1, discount: { $cond: [{ $gt: ["$price", 100] }, 10, 0 ] } } } |
| 71 | +]) |
| 72 | +``` |
| 73 | + |
| 74 | +Unwinding: To separate all the subdocuments from the array field 'tags' and create a new document for each value, you can use the following pipeline: |
| 75 | +```javascript |
| 76 | +db.collection.aggregate([ |
| 77 | + { $unwind: "$tags" } |
| 78 | +]) |
| 79 | +``` |
| 80 | + |
| 81 | +## Example with multiple stages |
| 82 | + |
| 83 | +```javascript |
| 84 | +db.sales.aggregate([ |
| 85 | + { $match: { date: { $gte: "2021-01-01", $lt: "2021-03-01" } } }, |
| 86 | + { $group: { _id: "$category", totalSales: { $sum: "$sales" } } }, |
| 87 | + { $sort: { totalSales: -1 } }, |
| 88 | + { $limit: 5 } |
| 89 | +]) |
| 90 | +``` |
| 91 | + |
| 92 | +In this example, we are using a sample collection called "sales" which has documents with the following fields: "date", "category", and "sales". |
| 93 | + |
| 94 | +The first stage { $match: { date: { $gte: "2021-01-01", $lt: "2021-03-01" } } } filters the documents by the "date" field, only passing documents with a date between January 1st, 2021 and February 28th, 2021. We are using a string date format with the format "YYYY-MM-DD". |
| 95 | + |
| 96 | +The second stage { $group: { _id: "$category", totalSales: { $sum: "$sales" } } } groups the documents by the "category" field and calculates the total sales for each group. |
| 97 | + |
| 98 | +The third stage { $sort: { totalSales: -1 } } sorts the documents by the "totalSales" field in descending order. |
| 99 | + |
| 100 | +The fourth stage { $limit: 5 } limits the number of documents passed to the next stage to only the top 5. |
| 101 | + |
| 102 | +As a result, the pipeline will return the top 5 categories by total sales for the specified date range. |
| 103 | + |
| 104 | +## Next steps |
| 105 | + |
| 106 | +- Learn how to [use Studio 3T](connect-using-mongochef.md) with Azure Cosmos DB for MongoDB. |
| 107 | +- Learn how to [use Robo 3T](connect-using-robomongo.md) with Azure Cosmos DB for MongoDB. |
| 108 | +- Explore MongoDB [samples](nodejs-console-app.md) with Azure Cosmos DB for MongoDB. |
| 109 | +- Trying to do capacity planning for a migration to Azure Cosmos DB? You can use information about your existing database cluster for capacity planning. |
| 110 | + - If all you know is the number of vCores and servers in your existing database cluster, read about [estimating request units using vCores or vCPUs](../convert-vcore-to-request-unit.md). |
| 111 | + - If you know typical request rates for your current database workload, read about [estimating request units using Azure Cosmos DB capacity planner](estimate-ru-capacity-planner.md). |
0 commit comments