Big Data and Analytics by Samiya Khan

Big Data and Analytics by Samiya Khan

Author:Samiya Khan
Language: eng
Format: epub
ISBN: 9798885304887
Publisher: Notion Press
Published: 2021-11-15T00:00:00+00:00


7.3 CLASSIFICATION OF NOSQL DATABASES

The common thing between all NoSQL databases is that they don’t support SQL. The four categories of NoSQL databases are given below.

7.3.1 Key-Value Stores

In such databases, all the operations are performed using a key. For example, if you have a session database, then data retrieval from this table will be via SessionID, the key for the table.

•Databases:

Dynamo (Amazon)

•Dynamo is Amazon’s offering. They provide it in the form of Database-as-a-Service.

•MemBase

•CitrusLeaf

•Voldemort (LinkedIn)

•Riak

7.3.2 Big Data Clones

This class of NoSQL databases is completely based on the Google’s whitepaper that introduced BigTable concept. Databases that fall under this category include –

•HBase

It is an Apache project that is implemented in Java. We will discuss this database in detail later in the chapter.

•BigTable (Google)

This is the base project, which is based on the whitepaper published by Google.

•Hypertable

It is a C++ implementation of the Google’s whitepaper.

•Cassandra

Facebook developed this database system. The team that developed Cassandra was also a part of the development team of Amazon’s DynamoDB. Therefore, Cassandra is majorly a combination of BigTable and DynamoDB.

7.3.3 Document Databases

This class of databases is extremely useful for storing data in XML or JSON format and it includes the following –

•MongoDB

•CouchOne

•OrientDB

•TerraStore

7.3.4 Graph Databases

These databases are based on the graph concept. The best use-case for such databases is social media network that can be simulated in the form of a graph with users as vertices and connections between them as edges. The graph concept is an efficient method for managing social media data. Included solutions are –

•FlockDB

•Neo4J

•InfoGrid

•Sones

7.3.5 CAP Theorem

The decision of which database to use from the given list is majorly dependent on your performance and system requirements. CAP theorem is a standard approach that is used for assistance in making such decisions. The CAP theorem uses three parameters namely, consistency, availability and partition tolerance and states that it is possible to get only two of these characteristics in a NoSQL solution. Therefore, the decision of which database to use must be purely based on which characteristics are important for you. The three characteristics are as follows –

1.Consistency

The commits performed in a database are atomic across the whole database.

2.Availability

The database is available and accessible at all times.

3.Partition Tolerance

The system responds correctly in all conditions except when there is a total network failure.

4.The distribution of databases across these characteristics is illustrated in Fig. 7.1.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
A Developer's Guide to Building Resilient Cloud Applications with Azure by Hamida Rebai Trabelsi(9356)
Distributed Machine Learning with Python by Guanhua Wang(3608)
Getting Started with CockroachDB by Kishen Das Kondabagilu Rajanna(2576)
Exploratory Data Analysis with Python Cookbook by Ayodele Oluleye(1417)
Getting Started With CockroachDB: A Guide to Using a Modern, Cloud-Native, and Distributed SQL Database for Your Data-Intensive Apps by Kishen Das Kondabagilu. Rajanna(1238)
R Web Scraping Quick Start Guide by Olgun Aydin(1082)
PostgreSQL 13 Cookbook: Over 120 recipes to build high-performance and fault-tolerant PostgreSQL database solutions by Vallarapu Naga Avinash Kumar(1016)
Mastering PostgreSQL 15 - Fifth Edition by Hans-Jürgen Schönig(689)
Apache Hadoop 3 Quick Start Guide by Hrishikesh Karambelkar(450)
Pandas for Everyone: Python Data Analysis, 2nd Edition by Daniel Y. Chen(447)
Learn SQL with MySQL: Retrieve and Manipulate Data Using SQL Commands with Ease by Ashwin Pajankar(406)
SQL Query Design Patterns and Best Practices by Steve Hughes & Dennis Neer & Dr. Ram Babu Singh & Shabbir H. Mala & Leslie Andrews & Chi Zhang(391)
Deploy Node.js on GCP: A comprehensive guide to deploying Node.js on Google Cloud Platform by Jonathan Lin(378)
Configuring Sales and Distribution in SAP ERP by Unknown(360)
Leveling Up with SQL by Mark Simon(336)
Learning Data Science by Sam Lau(325)
Intermediate Python by Oswald Campesato(321)
The Definitive Guide to Data Integration by Pierre-Yves BONNEFOY Emeric CHAIZE Raphaël MANSUY Mehdi TAZI(304)
Data Engineering with AWS: A Comprehensive Guide to Building Robust Data Pipelines by Paul Brian(297)
Pandas Basics by Oswald Campesato(294)