Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data by Byron Ellis
Author:Byron Ellis [Ellis, Byron]
Language: eng
Format: epub, pdf
Tags: Computers, Data Warehousing, Information Technology, Databases
ISBN: 9781118837917
Google: I-XRAwAAQBAJ
Publisher: John Wiley & Sons
Published: 2014-07-21T00:00:00+00:00
Like SQL databases, MongoDB also offers facilities for grouping and aggregating data in queries. The original facility for aggregation was either the group() or mapReduce() commands, but versions of MongoDB after 2.2 also support an optimized aggregate() command.
Unlike SQL, the pipeline command uses a pipeline approach for computing its results, taking an array of filtering and grouping commands used to reach a final result. This is easiest to understand in action, so first build a collection with some example data:
> abc = ['A','B','C','D','E','F','G','H','I','J','K','L', 'M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']; > db.createCollection("aggtest"); > for(var i=0;i<1000;i++) { ... db.aggtest.insert({ ... first:abc[Math.floor(Math.random()*abc.length)], ... second:abc[Math.floor(Math.random()*abc.length)], ... count:Math.floor(1000*Math.random()) ... }); ... } > db.aggtest.find({}) { "_id" : ObjectId("53213bc8ae5fcad63d0563e9"), "first" : "S", "second" : "W", "count" : 762 } { "_id" : ObjectId("53213bc8ae5fcad63d0563ea"), "first" : "E", "second" : "V", "count" : 381 } { "_id" : ObjectId("53213bc8ae5fcad63d0563eb"), "first" : "Q", "second" : "O", "count" : 143 } { "_id" : ObjectId("53213bc8ae5fcad63d0563ec"), "first" : "C", "second" : "I", "count" : 601 } { "_id" : ObjectId("53213bc8ae5fcad63d0563ed"), "first" : "B", "second" : "C", "count" : 413 } { "_id" : ObjectId("53213bc8ae5fcad63d0563ee"), "first" : "M", "second" : "D", "count" : 790 } { "_id" : ObjectId("53213bc8ae5fcad63d0563ef"), "first" : "S", "second" : "Q", "count" : 699 } { "_id" : ObjectId("53213bc8ae5fcad63d0563f0"), "first" : "A", "second" : "M", "count" : 615 } ... other output omitted Type "it" for more
The first stage of an aggregation pipeline is usually a filtering step that acts like the WHERE clause of a SQL statement. It is identified by a $match statement, as in this example, which selects all of the elements with the “A” as their value for the “first” element:
> db.aggtest.aggregate([{$match:{first:"A"}}]); { "result" : [ { "_id" : ObjectId("53213bc8ae5fcad63d0563f0"), "first" : "A", "second" : "M", "count" : 615 }, { "_id" : ObjectId("53213bc8ae5fcad63d0563f4"), "first" : "A", "second" : "F", "count" : 806 }, { "_id" : ObjectId("53213bc8ae5fcad63d056402"), "first" : "A", "second" : "Q", "count" : 377 }, ...more content omitted... { "_id" : ObjectId("53213bc9ae5fcad63d0567c5"), "first" : "A", "second" : "G", "count" : 769 } ], "ok" : 1 }
Other filtering options are $limit and $skip. Mostly used for testing as an initial filter, the $limit filter restricts the number of elements entering the aggregation, as in this example:
> db.aggtest.aggregate([{$limit:1}]); { "result" : [ { "_id" : ObjectId("53213bc8ae5fcad63d0563e9"), "first" : "S", "second" : "W", "count" : 762 } ], "ok" : 1 }
The $limit command is more typically used after a grouping and sorting operation to limit the output to the user. Similarly, the $skip command will ignore some number of documents entering the filter. Combined with $limit, it is often used after grouping, as well as to implement pagination:
> db.aggtest.aggregate([{$skip:10},{$limit:1}]); { "result" : [ { "_id" : ObjectId("53213bc8ae5fcad63d0563f3"), "first" : "M", "second" : "E", "count" : 437 } ], "ok" : 1 }
After filtering commands are applied in the pipeline, group management commands are applied. The most commonly used command is the $group operator, which specifies an identifier field and some number of accumulators. For example, to sum
Download
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data by Byron Ellis.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8303)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6756)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6732)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6616)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6402)
Driving Data Quality with Data Contracts by Andrew Jones(6343)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6108)
Learning SQL by Alan Beaulieu(5998)
Weapons of Math Destruction by Cathy O'Neil(5784)
Big Data Analysis with Python by Ivan Marin(5372)
Data Engineering with dbt by Roberto Zagni(4372)
Solidity Programming Essentials by Ritesh Modi(4021)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3882)
Pandas Cookbook by Theodore Petrou(3587)
Blockchain Basics by Daniel Drescher(3298)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2909)
Feature Store for Machine Learning by Jayanth Kumar M J(2816)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2799)
Mastering Python for Finance by Unknown(2745)
