A JOURNEY THROUGH NOSQL DATABASE:
EXPLORING ARCHITECTURE EN ROUTE Arun S Jacob

Abstract:

Data is growing in huge volumes beyond our perceptions, assuming several forms. A digital-first mechanism is not the only solution to efficiently store and process this data. Today's connected world needs a futuristic vision, and we need to adopt a 'future-first' database system. Putting aside the arguments on NoSQL vs. other data store types, we have analyzed the curious case of NoSQL databases to understand why it's becoming the most appealing option for enterprises. Explore NoSQL further through our latest thinking.

Demystifying the curious case of nosql databases

As we moved into the 21st century, you will almost certainly have come across a story about how "DATA" is changing the face of our world. Whichever industry you work, the key factor for the efficient working and advancement of the industry depends on the amount and quality of the data we possess.

With the development of internet and connected living, the amount of information created, distributed and harnessed to make business decisions are more than ever before. The rise in The quantity of the data we process, led to the need for a highly efficient storage system which incited the inventions of different database technologies.

In this white paper, we will be discussing two areas:

  • the reasons that led to the invention of NoSQL databases, its need & advantages and,
  • a voyage through the architecture of NoSQL databases

A brief History of Relational Database

People were trying to store digital data in many possible ways since the beginning of first computer. The storage was so basic and flat at first, but did not obtain the "NoSQL" tag until its popularity mounted in the early twenty-first century. The ideology of structured data or the relational database came into existence from the late 1960s. Donald D. Chamberlin and Raymond F. Boyce of IBM developed SQL (Structured Query Language) after learning about the relational model from Ted Codd in the early 1970s, which later became the de-facto language to communicate (querying and maintaining) relational databases. With key industry players like Microsoft, IBM and Oracle came into the act in the mid-80s, relational database became widely accepted and standardized and rose to power.

The need for NoSQL database

The RDBMS was sufficient to store and manipulate data during a very long period of technological advancement. However, relational database was having many drawbacks where the major one is the "Impedance mismatch"

Referring Telerik:

"The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program (or multiple application programs) written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by a relational schema".

As in the example of an organization, the RDBMS stores the details of an employee by splitting it against multiple tables like Address, Salary, Department, Reporting Hierarchy etc. When the need arises to capture additional information about all or a few employees, the schema change has to be made for the whole organization. However, most of the developers are aware that a schema change can some time lead to massive workloads in relational databases.

Late 1990s saw the introduction of the Object databases, which stores data in the form of objects as used in object-oriented programming, by directly storing it into the DB than stripping it across different tables as in relational database. But the idea failed miserably due to dominance and acceptance of RDBMS and availability of no good database.

With the huge advancement in the development and growth of Internet/data, the need for a non-structured (or not very structured), distributed, cost effective, scalable database system became the urgency of the 21st century. Contributing to this boom is the ubiquity of social media sites like Facebook, LinkedIn, Instagram where unpredictable amount of unstructured data is thrown in every second.

IBM states that, "2.5 Exabyte - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012. That's big by anyone's standards. About 75% of data is unstructured, coming from sources such as text, voice and video. And as mobile phone penetration is forecast to grow from about 61% of the global population in 2013 to nearly 70% by 2017, those figures can only grow".

Governed by the desire to overcome the fundamental limitations of relational DBs, such as lack of horizontal scale, flexibility, availability, findability and high cost, urgency for the NoSQL databases flourished in the 2000s.

Google came with their ambitious product Big Table in 2005 and Amazon presented the Dynamo storage in 2007 and DynamoDB in 2012 (which many consider as the first large-scale, or web-scale, production NoSQL database). This actually triggered the NoSQL movement.

To quote author Joe Brockmeier of Red Hat,

"Amazon's Dynamo paper is the paper that launched a thousand NoSQL databases."

Brockmeier also suggests that the

"The paper inspired, at least in part, Apache Cassandra, Voldemort, Riak and other projects."

The graph shows the growth curve of data from 2006-2020 Source: Patrick Cheesman

The graph shows the growth curve of data from 2006-2020
Source: Patrick Cheesman

Advantages of NoSQL over RDBMS

NoSQL databases offer many important advantages over traditional RDBMS, including and not limited to:

High scalability: Uses horizontal scale-out methodology which avoids the enormous cost and complexity of manual sharding of RDBMS scaling.

Lower cost: The open source nature of NoSQL databases makes them an attractive solution for smaller organizations with budget constraints.

High Availability: Designed to ensure high availability. Many "distributed" NoSQL databases have a "master less" architecture that automatically distributes data equally among multiple resources. This ensures that the application remains available for both read and write operations, even if one node fails.

Flexible Data Modeling: NoSQL supports the implementation of flexible and fluid data models. Using NoSQL, developers can leverage the data types and query options most suited to a specific use case, instead of using those that fit the database schema. This simplifies the interaction between the application and the database, and fosters a faster, agile-first development.

Less Need for ETL: NoSQL databases support storage of data "as it is." Key value stores offer the ability to store simple data structures, whereas document NoSQL databases provide the flexibility to handle a wide range of flat or nested structures like JSON, XML etc.

Performance: NoSQL supports distributed computing and is very cluster friendly. As the number of nodes processing each request increases, the performance benefits also multiply.

Database administration: NoSQL databases do not need extensive hands-on management due to distributed data and auto repair capabilities, simplified data models and less tuning and administration requirements. However, there should be a minimal monitoring of performance and availability of databases.

The Data Store of NoSQL

The data store refers to the way information is stored in a database. For a RDBMS, the information is stored as rows and columns of tables, refereed/related with keys and organized using clustered index. This enforces the use of SQL to query the data in and out.

But this is different in case of NoSQL. The 4 basic types of NoSQL databases are listed below, along with the different querying mechanism used for each, depending upon the database developers:

Key-Value Store – Includes a Big Hash Table of keys & values. Example-Amazon Dynamo

Document-based Store- Stores documents made up of tagged elements. Example-CouchDB

Column-based Store- Each storage block contains data from only one column, Example-Cassandra

Graph-based- This is a network database that uses edges and nodes to represent and store data. Example- Neo4J

http://cdn.ttgtmedia.com/rms/onlineImages/data_management-nosql.png

Exploring the nosql database architecture

In this section, we would be dealing in detail with:

  • 1. Data Models/Data Store of NoSQL
  • 2. CAP Theorem
  • 3. ACID vs. BASE

The big FOUR data store of NoSQL

The arena of enterprise IT is very dynamic, and making a choice can be perplexing. With the good old RDBMS, the data is stored in tabular form or in tables with relation set across the tables. It's pretty straight forward and remained almost same since the introduction of the relational database in the 1960s. However, things are a bit different in the case of NoSQL databases.

NoSQL provides a variety of data store, out of which the prominent 4 are explained below.

The infographics below will give a summary of the four data store architectures.

https://image.slidesharecdn.com/nosqldatabases-140607004845-phpapp01/95/hbase -vs-cassandra-vs-mongodb-choosing-the-right-nosql-database-12-638.jpg?cb=1405298190

Key-Value Databases: This is one of the simplest NoSQL databases and also the easiest one to implement. Here, the principal idea involves using a hash table, similar to that of a C# dictionary. There will be a unique key and a pointer to a specific item of data, stored as blob.

The Key-Value data store is not really efficient when the normal cycle of operation includes querying or updating a part of data stored. Also, they are not designed to perform complex querying trying to connect multiple pieces of data. Key-Value exhibits poor performance if there are many-to-many relationship in the data.

Examples of key-value databases are memcached, Redis and Oracle BDB.

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR386o F8qLvpdPI-rahx7vflyBanITxHmOJJKZHBe0qfyyQvSC0

Let us consider the example of an organization. The data subset represented in the following table is the employee details for the organization stored in Key-Value store. Here employee id is the key, while the value involves the details for the employee.

Key Value
"EMP1048" {"Sam,Male,Engineer,01/05/1990"
"EMP1049" {" Samantha,Female,IT Engineer,21/03/1981,Married"}
"EMP1050" {" John Doe,Male,Administrator,25/12/1972,Married,185 cm, +90-8978965465"}

Document Stored: The document databases store data as documents. Generally, documents refer to collections of other key-value sets like JSON or XML. Document databases are usually considered as the next level of key-value, that allow nested values linked with each key.

The document store database supports more efficient data querying. These databases have a highly flexible schema and are designed for internet.

MongoDB, CouchDB, RavenDB and IBM Cloudant are the leading document stored databases.

http://blogs.avalonconsult.com/files/2014/10/Figure5.png

In the example of an organization, following section shows data values collected as a "document" representing the names of specific offices. Note that while all the three examples represent locations, the representative models are different.

{officeName:"Infopark",

{Street: "TJ", City:"Kochi", State:"Kerala", Postalcode:"682030"}

}

{officeName:"ElectonicCity",

{Boulevard:"Sunset Boulevard", Block:"H7", City: "NY", Postalcode: "TXC896-98"}

}

{officeName:"Techcity",

{Latitude:"40.005896", Longitude:"-52.12487"}

Column Oriented: Created from the Google BigTable architecture, these databases focus on columns and groups of columns for data storage. That means, data is stored in cells grouped as columns of data rather than as rows of data. Each data is referred using a row key. Columns are logically grouped into column families. Read and write is performed using columns rather than rows.

Column Oriented databases are created to store and process vast amounts of data, which is better compressed and distributed across several machines.

Examples of column-oriented databases are BigTable, Cassandra and HBase.

http://database.guide/wp-content/uploads/2016/06/wide_column_store_ database_example_column_family-1.png

In the example of organization, various offices which are located in different places are represented as a 2-dimensional table in RDBMS.

Office Postal Code Division Products
Infopark 682030 20 250
Electronic city TXC896-98 15 200
Techcity PST 698 021 10 150

For the above RDBMS table, a BigTable map can be visualized as shown below.

{
OrgInfopark: {
address:{
office: Infopark
postalcode: 682030
},
details: {
divison: 20
products: 250
}
},
{
OrgECity: {
address:{
office: Electronic City
postalcode: TXC896-98
},
details: {
divison: 15
products: 200
}
},
{
OrgTechcity: {
address:{
office: Techcity
postalcode: TXC896-98
},
details: {
divison: 15
products: 200
}
}

  • The outermost keys OrgInfopark, OrgECity and OrgTechcity are analogues to rows.
  • 'address' and 'details' are called column families.
  • The column-family 'address' includes two columns: 'office' and 'postalcode'.
  • The column-family 'details' includes two columns: 'divison' and 'products'.

Graph Based: Graph based database brings in the flexibility of relations without sticking to the rigid structure of SQL or the tables and columns interpretation used in the RDBMS. Graph database uses a graphical representation which is highly flexible, and can address scalability concerns perfectly.

Graph structures are used with edges, nodes and properties that offer index-free adjacency. Using a Graph Base NoSQL database, data can be easily transformed from one model to the other.

Examples of Graph databases are neo4j, OrientDB and FlockDB.

https://cdn-images-1.medium.com/max/1600/1*NfooWkME54jJ--wOkLIA4w.jpeg

The example of the organization will be stored in Graph database as given below. Every node is connected to some other node and assigned a relation.

https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/organization_graph.png

The CAP theorem

In 2000, Eric Brewer published the CAP Theorem on distributed network applications, which states that "a distributed computer system cannot guarantee all of the following three properties at the same time:

  • Consistency – once data is written, all future read requests will contain that data
  • Availability – the database is always available and responsive
  • Partition tolerance – if one part of the database is unavailable, other parts are unaffected"

In May 2012, Brewer clarified some of his views on the oft-used "two out of three" concept, leaving three feasible design options: CP, AP and CA The three combinations can be defined as:

  • CA – consistent data is available between all nodes. If all the nodes are online, users can read/write from any node and ensure that the data is the same on all nodes.
  • CP – consistent data is available between all nodes and maintains partition tolerance by becoming unavailable when a node goes down.
  • AP – nodes remain online even if they can't communicate with each other and will re-sync data once the partition is resolved, but there is no guarantee that all nodes will have the same data (either during or after the partition)

http://www.youritgoeslinux.com/sites/www.youritgoeslinux.com/files/nosql/capt_2.png

As you could see from the image above NoSQL guarantees AP whereas RDBMS does CA. Why is it important? - A business conundrum. The organization have to decide over what to compromise when choosing the database for their business model. The choice between consistency and Partition Tolerance actually move the slider towards RDBMS or NoSQL database.

The Chemistry of ACID and BASE

RDBMSs were designed to manage "structured" data in manageable fields, rows and columns such as dates, social security numbers, addresses and transaction amounts. ACID (Atomicity, Consistency, Isolation and Durability) forms a set of properties that confirm if database transactions are processed reliably. ACID is a necessity for financial transactions and other applications where precision holds the key.

Atomic: Atomicity requires that each transaction is "all or nothing." If one part of the transaction fails, the entire transaction fails and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation.

Consistency: The consistency property ensures that the database remains in a consistent state before the transaction commences and after the transaction is finished (whether successful or not).

Isolation: Modifications of data performed by a transaction must be independent of another transaction.

Durability: Durability refers to the guarantee that once the user has been notified of success, the transaction will continue and will not be half-done.

Conversely, most NoSQL DBs ostensibly allows for the absorption of unstructured data as such without following RDBMS format or structure. This works especially well for documents and metadata associated with a variety of unstructured data types, as managing text-based objects is not considered a transaction in the traditional sense. Thus we cannot strictly enforce ACID for NoSQL and so it follows BASE. BASE (basically available, soft state, eventually consistent) indicates that the DB will, sometime, classify and index the content to improve the findability of data or information contained in the text or the object.

A BASE system compromises on consistency in order to have greater Availability and Partition tolerance. A BASE can be defined as following:

Basically Available: implies that the system guarantees availability.

Soft state: implies that the state of the system may change over time, even without input. This is because of the eventual consistency model.

Eventual consistency: implies that the system will become consistent over time, provided it doesn't receive input during that time.

Being said that, the only NoSQL databases which follows ACID property is GRAPH.

The Choice

The choice of RDBMS vs. NoSQL depends on many factors. It depends on business decisions, amount of data, type of data, availability and so on.

The quick and general suggestion will be to:


Go for NoSQL, if

  • 1. Data is unstructured and huge
  • 2. Expected data growth rate is large
  • 3. You prefer a less rigid schema
  • 4. Value Performance & Availability over Redundancy
  • 5. Scale-out is a definite process in the long run
  • 6. Business prefer cost effectiveness as it uses clusters of low-priced commodity servers to manage data explosion and transaction volumes

Go for RDBMS, if

  • 1. A rigid schema is sufficient for all the business needs
  • 2. Data is primarily used for Analytics, BI or Reporting
  • 3. Business wants to reap the benefits of ACID
  • 4. You want to get rid of redundancy
  • 5. A distributed cluster system is not required for the business
  • 6. The need for Scale-Out is limited (with sharding) and allows Scale-Up
  • 7. Business is ready to spend for DB management and rely on expensive proprietary servers and storage systems

The Future of NoSQL Databases

With more than 250 database options to choose from, NoSQL databases become an inevitable part of the database landscape today. With their unique advantages including lower cost and scalability, they are becoming the game changers in modern day enterprises.

The NoSQL features make an appealing option for many companies. However, the relatively young technology with lack of specific set of standards like the existing SQL databases often creates a small friction for the wide acceptance.

The four types of NoSQL database classifications explained above are only guides. While some suggest that NoSQL is the future path of database landscape, many are concerned about its lack of standardization and ACID compliance. The choice between NoSQL and SQL depends purely on the organization needs, volume of data consumed, and data varieties. For instance, Cassandra combines key-value elements with a wide-column store and a graph database. At times, NoSQL elements are mixed with SQL elements, to create multi-model databases. It won't be a big surprise if a hybrid database technology arises from the base of RDBMS and NoSQL to make up the disadvantages of both the technologies.

The possibilities of vendors producing a hybrid NoSQL DB (of many database models), cannot be ignored in the foreseeable future.

The Conclusion

All the commotion over the NoSQL DB doesn't means the demise of the good old RDBMS system. Computational and storage requirements of applications involving Big Data analytics, Business Intelligence and social networking over peta-byte datasets forced us to the move from SQL to NoSQL DBs. The choice between SQL and NoSQL will become difficult in the future as RDBMS vendors are also innovating to meet the rising enterprise demands, with help of advancement in disk and RAM technologies. Microsoft has already integrated the Graph database to the latest MSSQL 2017 edition.

Say YES to NoSQL, but not everything old is bad and everything new is the future.

Authored by Arun S Jacob

Arun on his passion for Database technologies, "I always wonder how the world works this way and that keeps me busy looking into the deep internals of every system and problem which I come across. As a hard-core database fan, I love understanding and solving problems in SQL Server. You'll see one happy man, if we are talking about a poor-performing query or system."

About Us

Zerone Consulting is a leading agile software development company that delivers innovative technology and business solutions to customers across the world. Our focus is to accelerate our customers' journey towards digital transformation by ensuring them rapid delivery, transparency, and cost advantages in the best possible way. We help forward-thinking businesses to leverage transformation through Artificial Intelligence, Cognitive Computing, Internet of Things (IoT), Robotic Process Automation, Data Analytics, Face Detection and Recognition, Natural Language Processing, and Blockchain technology.

Zerone Consulting started its operations way back in 2003 and is located in Kochi, India. We have a success rate of 99+%. We are ISO 27001 certified with an exceptional track record of completing and delivering 500+ successful projects.

Reference:

Contact us
Close Get In Touch

Interested in our services? Let us get in touch to offer customized services !

Thank You. Your message has been successfully submitted.
Oops, an error occurred! Please try again.

© 2005-2018 Zerone Consulting Private Limited. All Rights Reserved