Big Data For Dummies by Judith Hurwitz

Big Data For Dummies by Judith Hurwitz

Author:Judith Hurwitz [Hurwitz, Judith]
Language: eng
Format: epub
Published: 2013-03-21T16:23:17+00:00


Chapter 12: Defining Big Data Analysis 149

to turn to Chapter 4 for more details on infrastructure issues. Suffice it to say that if you’re looking for a platform, it needs to achieve the following:

Integrate technologies: The infrastructure needs to integrate new big data technologies with traditional technologies to be able to process all

kinds of big data and make it consumable by traditional analytics.

Store large amounts of disparate data: An enterprise-hardened Hadoop

system may be needed that can process/store/manage large amounts of

data at rest, whether it is structured, semi-structured, or unstructured.

Process data in motion: A stream-computing capability may be needed

to process data in motion that is continuously generated by sensors,

smart devices, video, audio, and logs to support real-time decision

making.

Warehouse data: You may need a solution optimized for operational or

deep analytical workloads to store and manage the growing amounts of

trusted data.

And of course, you need the capability to integrate the data you already have in place along with the results of the big data analysis.

Studying Big Data Analytics Examples

Big data analytics has many different use cases. We mention examples

throughout this book, but we now look at a few others from Internet compa-

nies and others.

Orbitz

If you’ve ever looked for deals on travel, you’ve probably been to sites like Orbitz (www.orbitz.com). The company was established in 1999, and its

website went live in 2001. Users of Orbitz perform over a million searches a day, and the company collects hundreds of gigabytes of raw data each day

from these searches. Orbitz realized that it might have useful information in the web log files that it was collecting from its web analytics software that contained information about consumer interaction with its site.

In particular, it was interested to see whether it could identify consumer

preferences to determine the best-performing hotels to display to users so

that it could increase conversions (bookings). It had not been utilizing this data in the past because it was too expensive to store all of it. It implemented

150 Part IV: Analytics and Big Data

Hadoop and Hive running on commodity hardware to help. Hadoop provided

the distributed file system and Hive provided an SQL-type interface. It took a series of steps to put the data into Hive. After the data was in Hive, the company used machine learning — a data-driven (and data-mining; see the sidebar earlier in this chapter) approach to unearthing patterns in data and helping to analyze the data. For more details about Hadoop and Hive, turn to Chapters 9 and 10.

Nokia

Nokia provides wireless communication devices and services. The com-

pany believes that its data is a strategic asset. Its big data analytics service includes a multipetabyte platform that executes over tens of thousands of

jobs each day. This includes utilizing advanced analytics over terabytes of

streaming data. For example, the company wants to understand how people

interact with its different applications on its phones. Nokia wants to understand what features customers use, how they use a feature, and how they

move from feature to feature and whether they get lost in the application as they are using it. This level



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.