Home > Computers & Technology > Databases & Big Data > Data Modeling & Design

21 Recipes for Mining Twitter by Russell Matthew A

Author:Russell, Matthew A. [Matthew A. Russell] , Date: February 14, 2016 ,Views: 449

21 Recipes for Mining Twitter by Russell Matthew A

Author:Russell, Matthew A. [Matthew A. Russell]
Language: eng
Format: epub
Tags: COMPUTERS / Data Modeling & Design
ISBN: 9781449303853
Publisher: O'Reilly Media, Inc.
Published: 2011-01-30T16:00:00+00:00

Note

The following discussion is somewhat advanced and focuses on trying to explain how the summing_reducer function works, depending on whether the value of its rereduce parameter is True or False. Feel free to skip this section if you're not interested in honing in on those details just yet.

In short, a mapper will take a tweet and emit normalized entities such as #hashtags and @mentions, and a reducer will perform aggregate analysis on those values emitted from the mapper by counting them. The output from multiple mappers is then passed into a reducer for the purpose of performing an aggregate operation. The important subtlety with the way that the reducer is invoked is that it is passed keys and values such that each invocation’s values parameter guarantees matching keys. This turns out to be a very convenient characteristic, and for the problem of tabulating frequencies, it means that you only need to count the number of values to know the frequency for the key if the rereduce parameter is False. In other words, if the keys were ['@user', '@user', '@user'], you’d only need to compute the length of that list to get the frequency of @user for that particular invocation of the reduction function.

The actual number of keys and values that are passed into each invocation of a reduction function is a function of the underlying B-Tree used in CouchDB, and here, the illustration used a tiny size of 3 for simplicity. The subtlety to note is that multiple calls to the reducer could occur with the same keys—which conceptually means that you wouldn’t have a final aggregated answer. Instead you’d end up with something like [(“@user”, 3), (“@user”, 3), “@user”, 3), ...], which represents an intermediate result. When this happens, it’s necessary for the output of these reductions to be rereduced, in which case the rereduce flag will be set to True. The value for the keys is of no consequence, since we are already operating on output that’s guaranteed to have been produced from the same keys. In the working example, all that needs to happen is a sum of the values, 3 + 3 + 3 +, ... + 3, in order to come to a final aggregate value. A discussion of rereduce is inherently a slightly advanced topic, but is fundamental to an understanding of the map/reduce paradigm. It may bend your brain just a little bit, but manually working through some examples is very conducive to getting the hang of it.

Once the frequency maps are computed, the details for visualizing the entities in a tag cloud amount to little more than scaling the size of each entity and writing out the JSON data structure that the WP-Cumulus tag cloud expects. The HTML_TEMPLATE in the example contains the necessary SCRIPT tag references to pull the JavaScript libraries and other necessary artifacts. Only the data needs to be written to a %s placeholder in the template.

Download

21 Recipes for Mining Twitter by Russell Matthew A.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Access	Data Mining
Data Modeling & Design	Data Processing
Data Warehousing	MySQL
Oracle	Other Databases
Relational Databases	SQL