Big Data for Chimps: A Guide to Massive-Scale Data Processing in Practice by Philip (flip) Kromer & Russell Jurney

Big Data for Chimps: A Guide to Massive-Scale Data Processing in Practice by Philip (flip) Kromer & Russell Jurney

Author:Philip (flip) Kromer & Russell Jurney [Kromer, Philip (flip)]
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2015-09-27T21:00:00+00:00


Pattern in use

Where You’ll Use It

Anywhere you’re summarizing counts.

Standard Snippet

FOREACH (GROUP recs BY mykey) GENERATE group AS mykey, COUNT_STAR(recs) AS ct;.

Hello, SQL Users

SELECT key, COUNT(*) as CT from recs GROUP BY key;. Remember: COUNT_STAR(recs), not COUNT(*).

Important to Know

See “Pattern in use”.

Output Count

As many records as the cardinality of its key (i.e., the number of distinct values).

Records

Output is mykey, ct:long.

Dataflow

Map, combiner, and reduce; combiners are very effective unless cardinality is extremely high.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.