Aggregation operations process data records and return computed results. They allow you to perform a variety of data analysis and computation, such as summing values, calculating averages, grouping data, and many other potential operations.
Introduction to Aggregation
Aggregation is a way of batching processing data and returning computed results, similar to SQL’s GROUP BY
and aggregate functions. Aggregation operations can use stages to filter and manipulate the data before returning a result.
Aggregation can be performed using:
- The Aggregation Pipeline
- Single Purpose Aggregation Methods
Aggregation Pipeline
The Aggregation Pipeline is a framework for data aggregation, modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result. The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.
Here is an example of an aggregation pipeline that groups documents by the customerId
field and calculates the total quantity for each customerId
:
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$customerId", total: { $sum: "$quantity" } } }
])
The $match
stage filters the documents by the status
field, and the $group
stage groups the documents by the customerId
field and calculates the total quantity for each group.
Stages
The MongoDB Query Language supports different possible pipeline stages. In the following, we describe all stages supported by Polypheny.
$addFields
: Adds a new field to an existing record.$count
: Counts the number of records from the previous stage.$group
: Groups the record by a needed _id field, and additionally allows specifying other aggregations.$limit
: Limits the retrieved records to the given amount of records.$match
: Filters the previous stage with the provided condition.$project
: Allows to apply an inclusive or exclusive projection to the records.$replaceRoot
: This is an alias for $replaceWith that allows to replace a document with one of its subfields. Attention: This stage is only possible when querying a document namespace.$set
: This is an alias for $addFields, which allows adding additional entries to an existing record during retrieval.$skip
: Skips the specified amount of records.$sort
: Allows sorting by the given entities.$unset
: This is an alias for multiple exclusive projections.$unwind
: Allows to deconstruct an array into multiple records for each entry. Attention: This stage is only possible when querying a document namespace.
Single Purpose Aggregation Methods
While the aggregation pipeline can perform a super-set of operations provided by SQL’s GROUP BY
and HAVENING
clauses, there are also methods that aggregate data without using the aggregation pipeline. These include:
db.collection.count()
: Counts and returns the number of results that match a query.db.collection.distinct()
: Returns a list of distinct values for the given key across a collection.db.collection.group()
: Group documents in a collection by a specified key.
For example, to get a count of documents with a status of “A”:
db.orders.count({ status: "A" })
To get a list of distinct statuses in the orders collection:
db.orders.distinct("status")
To group orders by status and calculate the total quantity for each group:
db.orders.group({
key: { status: 1 },
reduce: function(curr, result) {
result.total += curr.quantity;
},
initial: { total: 0 }
})