HBase: The Definitive Guide by Lars George

HBase: The Definitive Guide by Lars George

Author:Lars George [Lars George]
Language: eng
Format: epub, pdf
Tags: COMPUTERS / Data Modeling & Design
ISBN: 9781449396138
Publisher: O'Reilly Media
Published: 2011-08-29T16:00:00+00:00


Fixed column mapping

The row key must be the first field and cannot be placed anywhere else. This can be overcome, though, with a subsequent FOREACH...GENERATE statement, reordering the relation layout.

Check with the Pig project site to see if these features have since been added.

Cascading

Cascading is an alternative API to MapReduce. Under the covers, it uses MapReduce during execution, but during development, users don’t have to think in MapReduce to create solutions for execution on Hadoop.

The model used is similar to a real-world pipe assembly, where data sources are taps, and outputs are sinks. These are piped together to form the processing flow, where data passes through the pipe and is transformed in the process. Pipes can be connected to larger pipe assemblies to form more complex processing pipelines from existing pipes.

Data then streams through the pipeline and can be split, merged, grouped, or joined. The data is represented as tuples, forming a tuple stream through the assembly. This very visually oriented model makes building MapReduce jobs more like construction work, while abstracting the complexity of the actual work involved.

Cascading (as of version 1.0.1) has support for reading and writing data to and from an HBase cluster. Detailed information and access to the source code can be found on the Cascading Modules page (http://www.cascading.org/modules.html).

Example 6-2 shows how to sink data into an HBase cluster. See the GitHub repository, linked from the modules page, for more up-to-date API information.

Example 6-2. Using Cascading to insert data into HBase

// read data from the default filesystem // emits two fields: "offset" and "line" Tap source = new Hfs(new TextLine(), inputFileLhs); // store data in an HBase cluster, accepts fields "num", "lower", and "upper" // will automatically scope incoming fields to their proper familyname, // "left" or "right" Fields keyFields = new Fields("num"); String[] familyNames = {"left", "right"}; Fields[] valueFields = new Fields[] {new Fields("lower"), new Fields("upper") }; Tap hBaseTap = new HBaseTap("multitable", new HBaseScheme(keyFields, familyNames, valueFields), SinkMode.REPLACE); // a simple pipe assembly to parse the input into fields // a real app would likely chain multiple Pipes together for more complex // processing Pipe parsePipe = new Each("insert", new Fields("line"), new RegexSplitter(new Fields("num", "lower", "upper"), " ")); // "plan" a cluster executable Flow // this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe Flow parseFlow = new FlowConnector(properties).connect(source, hBaseTap, parsePipe); // start the flow, and block until complete parseFlow.complete(); // open an iterator on the HBase table we stuffed data into TupleEntryIterator iterator = parseFlow.openSink(); while(iterator.hasNext()) { // print out each tuple from HBase System.out.println( "iterator.next() = " + iterator.next() ); } iterator.close();



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.