NoSQL in the cloud: A scalable alternative to Relational Databases

With the current move to cloud computing, the need to scale applications presents itself as a challenge for storing data. If you are using a traditional relational database you may find yourself working on a complex policy for distributing your database load across multiple database instances. This solution will often present a lot of problems and probably won’t be great at elastically scaling.

As an alternative you could consider a cloud-based NoSQL database. Over the past few weeks I have been analysing a few such offerings, each of which promises to scale as your application grows, without requiring you to think about how you might distribute the data and load.

Specifically I have been looking at Amazon’s DynamoDB, Google’s Cloud Datastore and Cloud BigTable. I chose to take a look into these 3 databases because we have existing applications running in Google and Amazon’s clouds and I can see the advantage these databases can offer. In this post I’ll report on what I’ve learnt.

Consistency, Availability & Partition Tolerance

Firstly — and most importantly — it’s necessary to understand that distributed NoSQL databases achieve high scalability in comparison to a traditional RDBMS by making some important tradeoffs.

A good starting-place for thinking about this is the CAP Theorem, which states that a distributed database can — at most — provide two of the following: Consistency, Availability and Partition Tolerance. We define each of these as follows:

  • Consistency: All nodes contain the same data
  • Availability: Every request should receive a response
  • Partition Tolerance: Losing a node should not affect the system

Eventually Consistent Operations

All three NoSQL databases I looked at provide Availability and Partition Tolerance for eventually-consistent operations. In most cases these two properties will suffice.

For example, if a user posts to a social media website and it takes a second or two for everyone’s request to pick up the change, then it’s not usually an issue.

This happens due to write operations writing to multiple nodes before the data is eventually replicated across all of the nodes, which usually occurs within one second. Read operations are then read from only one node.

Strongly Consistent Operations


All three databases also provide strongly consistent operations which guarantee that the latest version of the data will always be returned.

DynamoDB achieves this by ensuring that writes are written out to the majority of nodes before a success result is returned. Reads are also done in a similar way — results will not return until the record is read from more then half of the nodes. This is to ensure that the result will be the latest copy of the record.

All this occurs at the expense of availability, where a node being inaccessible can prevent the verification of the data’s consistency if it occurs a short time after the write operation. Google achieves this behaviour in a slightly different way by using a locking mechanism where a read can’t be completed on a node until it has the latest copy of the data. This model is required when you need to guarantee the consistency of your data. For example, you would not want a financial transaction being calculated on an old version of the data.

OK, now that we’ve got the hard stuff out of the way, let’s move onto some of the more practical questions that might come up when using a cloud-based database.

Local Development

Having a database in the cloud is cool, but how does it work if you’ve got a team of developers, each of whom needs to run their own copy of the database locally? Fortunately, DynamoDB, BigTable and Cloud Datastore all have the option of downloading and running a local development server. All three local development environments are really easy to download and get started with. They are designed to provide you with an interface that matches the production environment.

Java Object Mapping

If you are going to be using Java to develop your application, you might be used to using frameworks like Hibernate or JPA to automatically map RDBMS rows to objects. How does this work with NoSQL databases?
DynamoDB provides an intuitive way of mapping Java classes to objects in DynamoDB Tables. You simply annotate the Java object as a DynamoDB Table and then annotate your instance variable getters with the appropriate annotations.

public class User {

public String getUsername(){
return username;
public void setUsername(String username){
this.username = username;

@DynamoDBAttribute(attributeName = “email”)
public String getEmail(){
return email;
public void setEmail(String email){ = email;


An important thing to understand about all of these NoSQL databases is that they don’t provide a full-blown query language.

Instead, you need to use their APIs and SDKs to access the database. By using simple query and scan operations you can retrieve zero or more records from a given table. Since each of the three databases I looked at provide a slightly different way of indexing the tables, the range of features in this space varies.

DynamoDB for example provides multiple secondary indexes, meaning there is the ability to efficiently scan any indexed column. This is not a feature in either of Google’s NoSQL offerings.

Furthermore, unlike SQL databases, none of these NoSQL databases give you a means of doing table joins, or even having foreign keys. Instead, this is something that your application has to manage itself.

That’s said, one of the main advantages in my opinion of NoSQL is that there is no fixed schema. As your needs change you can dynamically add new attributes to records in your table.

For example, using Java and DynamoDB, you can do the following, which will return a list of users that have the same username as a given user:

User user = new User(username);
DynamoDBQueryExpression<User> queryExpression =
new DynamoDBQueryExpression<User>().withHashKeyValues(user);
List<User> itemList =
Properties.getMapper().query(User.class, queryExpression);

Distributed Database Design

The main benefit of NoSQL databases is their ability to scale, and to do so in an almost seamless way. But, just like a SQL database, a poorly-designed NoSQL database can give you slow query response times. This is why you need to consider your database design carefully.

In order to spread the load across multiple nodes, distributed databases need to spread the stored data across multiple nodes. This is done in order for the load to be balanced. The flip-side of this is that if frequently-accessed data is on a small subset of nodes, you will not be making full use of the available capacity.

Consequently, you need to be careful of which columns you select as indexes. Ideally you want to spread your load across the whole table as opposed to accessing only a portion of your data.

A good design can be achieved by picking a hash key that is likely to be randomly accessed. For example if you have a users table and choose the username as the hash key it will be likely that load will distributed across all of the nodes. This is due to the likeliness that users will be randomly accessed.

In contrast to this, it would, for example, be a poor design to use the date as the hash key for a table that contains forum posts. This is due to the likeliness that most of the requests will be for the records on the current day so the node or nodes containing these records will likely be a small subset of all the nodes. This scenario can cause your requests to be throttled or hang.


Since Google does not have a data centre in Australia, I will only be looking at pricing in the US.

DynamoDB is priced on storage and provisioned read/write capacity. In the Oregon region storage is charged at $0.25 per GB/Month and at $0.0065 per hour for every 10 units of Write Capacity and the same price for every 50 units of read capacity.

Google Cloud Datastore has a similar pricing model. With storage priced at $0.18 per GB of data per month and $0.06 per 100,000 read operations. Write operations are charged at the same rate. Datastore also have a Free quota of 50,000 read and 50,000 write operations per day. Since Datastore is a Beta product it currently has a limit of 100 million operations per day, however you can request the limit to be increased.

The pricing model for Google Bigtable is significantly different. With Bigtable you are charged at a rate of $0.65 per instance/hour. With a minimum of 3 instances required, some basic arithmetic gives us a starting price for Bigtable of $142.35 per month. You are then charged at $0.17 per GB/Month for SSD-backed storage. A cheaper HDD-backed option priced at $0.026 per GB/Month is yet to be released.

Finally you are charged for external network usage. This ranges between 8 and 23 cents per GB of traffic depending on the location and amount of data transferred. Traffic to other Google Cloud Platform services in the same region/zone is free.


So which database should you use?

To be honest, I was not able to differentiate between them much from a performance perspective. In both local and cloud-based tests I found that, for queries returning a single record, response time was in the 10–30ms range, regardless of which platform I used.

Consequently, which you choose will probably depend on what cloud stack you are currently using and/or which billing model suits you best.
Since Bigtable has a minimum number of instances you may want to consider avoiding this if you don’t foresee much traffic, as you will likely be paying for capacity that you don’t need. However Bigtable can be cheaper if you load is significant enough to use 3 or more instances.

In contrast, Cloud Datastore will work well if your application has a small amount of traffic, yet will continue to work well if it scales up to have a large number of requests. Since you only pay for the requests that are made there will be no wasted capacity.

If you are on the Amazon stack, the only managed NoSQL option is DynamoDB. It is more expensive than Datastore and requires that you provision capacity in advance — meaning that will be charged for that capacity even if you don’t use it. Amazon has announced that they will be releasing an auto-scaling feature some time in the future. If you can’t wait for that you can write your own scripts to handle the scaling of provisioned capacity. It is important to get provisioned capacity right. If you are using more capacity then is provisioned then you will start to deplete accumulated credits. Once your credits are depleted your requests will start to be throttled and may even fail.

Whichever option you go with, the prospect of infinitely-scalable, always-available databases is alluring. By understanding the tradeoffs involved and choosing a pricing model that suits you, you may be able to free yourself from the constraints of a traditional RDBMS.
Originally published at on March 4, 2016.

{ Add a Comment }

An Introduction to Big Data: NoSQL

Note: The content of this blog post originally comes from teaching materials developed by Professor Michael Mior and Professor Carlos Rivero at Rochester Institute of Technology. I want to give them full credits for educating me on these fundamental concepts in Database!

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.

If you haven’t read my previous 3 posts about relational database, data querying, and data normalization, please do so. Here’s the roadmap for this fourth post on NoSQL database:

  1. Introduction to NoSQL
  2. Document Databases
  3. Key-Value Databases
  4. Graph Databases

1 — Introduction to NoSQL

NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stands for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

The NoSQL term can be applied to some databases that predated the relational database management system (RDBMS), but it more commonly refers to the databases built in the early 2000s for the purpose of large-scale database clustering in cloud and web applications. In these applications, requirements for performance and scalability outweighed the need for the immediate, rigid data consistency that the RDBMS provided to transactional enterprise applications.

NoSQL helps deal with the volume, variety, and velocity requirements of big data:

  • Volume: Maintaining the ACID properties (Atomicity, Consistency, Isolation, Durability) is expensive and not always necessary. Sometimes, we can deal with minor inconsistencies in our results. We thus want to be able to partition our data multiple sites.
  • Variety: One single fixed data model makes it harder to incorporate varying data. Sometimes, when we pull from external sources, we don’t know the schema! Furthermore, changing a schema in a relational database can be expensive.
  • Velocity: Storing everything durable to a disk all the time can be prohibitively expensive. Sometimes it’s okay if we have a low probability of losing data. Memory is much cheaper now, and much faster than always going to disk.

There is no single accepted definition of NoSQL, but here are its main characteristics:

  • It has quite a flexible schema, unlike the relational model. Different rows may have different attributes or structure. The database often has no understanding of the schema. It is up to the applications to maintain consistency in the schema including any denormalization.
  • It also is often better at handling really big data tasks. This is because NoSQL databases follow the BASE (Basically Available, Soft state, Eventual consistency) approach instead of ACID.
  • In NoSQL, consistency is only guaranteed after some period of time when writes stop. This means it is possible that queries will not see the latest data. This is commonly implemented by storing data in memory and then lazily sending it to other machines.
  • Finally, there is this notion known as the CAP theorem — pick 2 out of 3 things: Consistency, Availability, and Partition tolerance. ACID databases are usually CP systems, while BASE databases are usually AP. This distinction is blurry and often systems can be reconfigured to change these tradeoffs.

We’ll discuss different categories of NoSQL, including document databases, key-value databases, and graph databases.

2 — Document Databases

There are many different document databases systems, such as MongoDB, FoundationDB, RethinkDB, MarkLogic, ArangoDB… There is no standard system; however, they all have to deal with a data type known as JSON. It is taken from JavaScript and contains objects, strings, numbers, arrays, booleans, and null in a nested dictionary.

The most popular document database is MongoDB, which is open-source and stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. The MongoDB hierarchy starts out with the database, then a collection, then a document.

The query below creates a new collection or view. Because MongoDB creates a collection implicitly when the collection is first referenced in a command, this method is used primarily for creating new collections that use specific options. For example, you use db.createCollection() to create a capped collection or to create a new collection that uses document validation.

MongoDB also uses document types known as BSON, which is a binary serialization format used to store documents and make remote procedure calls. Each BSON type has both integer and string identifiers as listed in the following table:

You can use these values with the $type operator to query documents by their BSON type. The $type aggregation operator returns the type of an operator expression using one of the listed BSON type strings.

The query below inserts a value into a document.

The query below inserts multiple documents into a collection.

MongoDB provides the capability to perform schema validation during updates and insertions. Validation rules are on a per-collection basis. To specify validation rules when creating a new collection, use db.createCollection() with the validator option.

Let’s walk through a real scenario. As seen below, we use MongoDB to retrieve the data from the bills database under the following conditions:

  • The patient is 376–97–9845.
  • Address’s zip code is 14534.
  • The amount is greater than or equal to 750.
  • The amount is greater than or equal to 750 AND address’s zip code is 14534.
  • The amount is greater than or equal to 750 OR address’s zip code is 14534.

You can find a bunch of different MongoDB query operators below. Let’s start with comparison operators: eq, gt, gte, in, lt, lte, ne, and nin.

Here we have boolean operators: and, not, nor, and or.

Now we have element and evaluation operators: exists, type, mod, regex, text, and where.

Finally we have array and bitwise operators: all, elemMatch, size, bitsAllClear, bitsAllSet, bitsAnyClear, and bitsAnySet.

Here’s how to do projection in MongoDB:

It says that we want to find the patient with the particular value of “376–97–9845”, and we only want the patient and id attributes from it. A corresponding SQL query is shown below:

Here’s how to do sorting in MongoDB:

It says that we want to sort the bills table, ordered by patient ascending and zip code descending. A corresponding SQL query is shown below:

Here’s how to update documents in MongoDB:

It says that we want to update the bills table with new values for zip code and city, where the patient has value “376–97–9845”. A corresponding SQL query is shown below:

Here’s how to remove documents in MongoDB:

It says that we want to remove from the bills table the instance where the patient has value “376–97–9845”. A corresponding SQL query is shown below:

Let’s discuss how to do aggregation in MongoDB. Below is the MongoDB’s code structure to do a projection, when we want to pass along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.

The $project takes a document that can specify the inclusion of fields, the suppression of the _id field, the addition of new fields, and the resetting of the values of existing fields. Alternatively, you may specify the exclusion of fields.

Below is the MongoDB’s code structure to do match values, when we want to filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.

The $match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax, meaning that $match does not accept raw aggregation expressions. Instead, we can use a $expr query expression to include aggregation expression in $match.

Below is the MongoDB’s code structure to group values together, when we want to group documents by some specified expression and outputs to the next stage a document for each distinct grouping. The output documents contain an _id field which contains the distinct group by key. The output documents can also contain computed fields that hold the values of some accumulator expression grouped by the $group’s _id field. Note that $group does not order its output documents.

The _id field is mandatory; however, you can specify an _id value of null to calculate accumulated values for all the input documents as a whole. The remaining computed fields are optional and computed using the <accumulator> operators.

Let’s walk through an example of how MongoDB does grouping:

From the orders table, we match all the orders with “A” status. Then, we group the remaining results by cust_id. Finally, we return the cust_id and sum of the amount as a result.

For your information, here are different accumulators that can go to a field:

  • $sum, $avg, $max, $min (sum, average, maximum, minimum).
  • $first, $last (first and last value in a group).
  • $push (array of all values in a group).
  • $addToSet (distinct array of group values).
  • $stdDevPop, $stdDevSamp (population and sample standard deviation).

We have one special case when it comes to the group stage, which is when the _id is null, as seen below. The equivalent SQL query is Select Avg(amount) From bills.

Let’s move on the lookup stage. It performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

The $lookup takes a document with the following fields:

  • from: Specifies the collection in the same database to perform the join with. The from collection cannot be sharded.
  • localField: Specifies the field from the documents input to the $lookup stage. $lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField, the $lookup treats the field as having a value of null for matching purposes.
  • foreignField: Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.
  • as: Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

Here’s how to do look up in MongoDB:

It specifies that we want to lookup values from the bill tables joining with the patients table on the local field ‘patient’ and foreign field ‘_id’. A corresponding SQL query is shown below:

The last aggregation function we’ll discuss is the unwind stage. It deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.

3 — Key-value databases

A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database. Some popular key-value databases in used these days are rediscovering, Amazon DynamoDB, Aerospace, RiakKV, ArangoDB etc.

In key-value databases, updates to the value for a single key are usually atomic. Furthermore, many key-value databases allow for transactions which use multiple keys. Also, values have limited structure.

The upsides about key-value databases are:

  • Key-value databases are generally easier to run in a distributed fashion.
  • Queries and updates usually very fast.
  • Any type of data in any structure can be stored as a value.

However, the downsides about key-value databases are:

  • Very simple queries (usually just get a value given a key, sometimes a range).
  • No referential integrity.
  • Limited transactional capabilities.
  • No schema to understand the data.

Let’s briefly look at the most popular key-value database: Redis. Redis is basically a huge distributed hash table with little structure to values. All values are identified by a key which is a simple string. If we want more structure in our keys, it has to be defined by our application (e.g., user 3 could have the key “user:3”).

Redis values are commonly in key-value stores, values are just an arbitrary blob of data. Redis (and some other key-value stores) allows some structure: lists, sets, and hashes.

Below are examples of how to insert data in Redis:

Below are examples of how to retrieve data in Redis:

So when should you use a key-value database?

  • When you need something really fast.
  • When your data does not have a lot of structure/relationships.
  • For simple caching of data which is pulled from another source.

4 — Graph databases

Lastly, let’s talk about graph databases. They rely on unbounded queries, which is where the search criteria are not particularly specific and are thus likely to return a very large result set. A query without a WHERE clause would certainly fall into this category, but let’s consider for a moment some other possibilities.

Let’s walk through some examples using Neo4j, one of the world’s leading graph database. Below you can see a node in Neo4j with information about person and patient:

In order to create a node, we use the code below which is similar to Java. We insert node attributes with Sandra as the first name.

We can also give our node internal IDs.

Two nodes can have a relationship. For example, the 2 nodes Patient and Doctor have a primary relationship which starts in 2015–08–23.

To represent such a relationship, we can create edges between nodes, as seen here.

Let’s say we have sample data shown here. We have a relational model with 3 tables (Patient, Doctor, Visit) and different relationships (Attends, Primary, Attended By, Supervises).

That query below is for a single node. The SQL equivalent is “SELECT last FROM Patient WHERE age > 30 ORDER BY last ASC”.

That query below is to do aggregation. The SQL equivalent is “SELECT AVG(salary) FROM Doctor WHERE age > 30”.

And that’s the end of this post on NoSQL! I hope you found this helpful and get a good grasp of the basics of document-based, key-value, and graph-based databases. If you’re interested in this material, follow the Cracking Data Science Interview publication to receive my subsequent articles on how to crack the data science interview process.

{ Add a Comment }

The SQL vs NoSQL Difference: MySQL vs MongoDB

When it comes to choosing a database, one of the biggest decisions is picking a relational (SQL) or non-relational (NoSQL) data structure. While both are viable options, there are certain key differences between the two that users must keep in mind when making a decision.

Here, we break down the most important distinctions and discuss two of the key players in the relational vs non-relational debate: MySQL and MongoDB. Read this article and many more on Xplenty’s blog:

{ Add a Comment }

NoSQL Databases: a Survey and Decision Guidance

(At the bottom of this page, you find a BibTeX reference to cite this article.)

Together with our colleagues at the University of Hamburg, we — that is Felix Gessert, Wolfram Wingerath, Steffen Friedrich and Norbert Ritter — presented an overview over the NoSQL landscape at SummerSOC’16 last month. Here is the written gist. We give our best to convey the condensed NoSQL knowledge we gathered building Baqend.


Today, data is generated and consumed at unprecedented scale. This has lead to novel approaches for scalable data management subsumed under the term “NoSQL” database systems to handle the ever-increasing data volume and request loads. However, the heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context. Therefore, this article gives a top-down overview of the field: Instead of contrasting the implementation specifics of individual representatives, we propose a comparative classification model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases. This NoSQL Toolbox allows us to derive a simple decision tree to help practitioners and researchers filter potential system candidates based on central application requirements.

1. Introduction

Traditional relational database management systems (RDBMSs) provide powerful mechanisms to store and query structured data under strong consistency and transaction guarantees and have reached an unmatched level of reliability, stability and support through decades of development. In recent years, however, the amount of useful data in some application areas has become so vast that it cannot be stored or processed by traditional database solutions. User-generated content in social networks or data retrieved from large sensor networks are only two examples of this phenomenon commonly referred to as Big Data. A class of novel data storage systems able to cope with Big Data are subsumed under the term NoSQL databases, many of which offer horizontal scalability and higher availability than relational databases by sacrificing querying capabilities and consistency guarantees. These trade-offs are pivotal for service-oriented computing and as-a-service models, since any stateful service can only be as scalable and fault-tolerant as its underlying data store.

There are dozens of NoSQL database systems and it is hard to keep track of where they excel, where they fail or even where they differ, as implementation details change quickly and feature sets evolve over time. In this article, we therefore aim to provide an overview of the NoSQL landscape by discussing employed concepts rather than system specificities and explore the requirements typically posed to NoSQL database systems, the techniques used to fulfil these requirements and the trade-offs that have to be made in the process. Our focus lies on key-value, document and wide-column stores, since these NoSQL categories cover the most relevant techniques and design decisions in the space of scalable data management.

In Section 2, we describe the most common high-level approaches towards categorizing NoSQL database systems either by their data model into key-value stores, document stores and wide-column stores or by the safety-liveness trade-offs in their design (CAP and PACELC). We then survey commonly used techniques in more detail and discuss our model of how requirements and techniques are related in Section 3 , before we give a broad overview of prominent database systems by applying our model to them in Section 4 . A simple and abstract decision model for restricting the choice of appropriate NoSQL systems based on application requirements concludes the paper in Section 5.

2. High-Level System Classification

In order to abstract from implementation details of individual NoSQL systems, high-level classification criteria can be used to group similar data stores into categories. In this section, we introduce the two most prominent approaches: data models and CAP theorem classes.

2.1 Different Data Models

The most commonly employed distinction between NoSQL databases is the way they store and allow access to data. Each system covered in this paper can be categorised as either key-value store, document store or wide-column store.

2.1.1 Key-Value Stores. A key-value store consists of a set of key-value pairs with unique keys. Due to this simple structure, it only supports get and put operations. As the nature of the stored value is transparent to the database, pure key-value stores do not support operations beyond simple CRUD (Create, Read, Update, Delete). Key-value stores are therefore often referred to as schemaless: Any assumptions about the structure of stored data are implicitly encoded in the application logic (schema-on-read) and not explicitly defined through a data definition language (schema-on-write).

The obvious advantages of this data model lie in its simplicity. The very simple abstraction makes it easy to partition and query the data, so that the database system can achieve low latency as well as high throughput. However, if an application demands more complex operations, e.g. range queries, this data model is not powerful enough. Figure 1 illustrates how user account data and settings might be stored in a key-value store. Since queries more complex than simple lookups are not supported, data has to be analyzed inefficiently in application code to extract information like whether cookies are supported or not (cookies: false).

2.1.2 Document Stores. A document store is a key-value store that restricts values to semi-structured formats such as JSON documents. This restriction in comparison to key-value stores brings great flexibility in accessing the data. It is not only possible to fetch an entire document by its ID, but also to retrieve only parts of a document, e.g. the age of a customer, and to execute queries like aggregation, query-by-example or even full-text search.

3.1.3 Wide-Column Stores Wide-column stores inherit their name from the image that is often used to explain the underlying data model: a relational table with many sparse columns. Technically, however, a wide-column store is closer to a distributed multi-level sorted map: The first-level keys identify rows which themselves consist of key-value pairs. The first-level keys are called row keys, the second-level keys are called column keys. This storage scheme makes tables with arbitrarily many columns feasible, because there is no column key without a corresponding value. Hence, null values can be stored without any space overhead. The set of all columns is partitioned into so-called column families to colocate columns on disk that are usually accessed together. On disk, wide-column stores do not colocate all data from each row, but instead values of the same column family and from the same row. Hence, an entity (a row) cannot be retrieved by one single lookup as in a document store, but has to be joined together from the columns of all column families. However, this storage layout usually enables highly efficient data compression and makes retrieving only a portion of an entity very efficient. The data are stored in lexicographic order of their keys, so that data that are accessed together are physically co-located, given a careful key design. As all rows are distributed into contiguous ranges (so-called tablets) among different tablet servers, row scans only involve few servers and thus are very efficient.

Bigtable, which pioneered the wide-column model, was specifically developed to store a large collection of webpages as illustrated in Figure 3. Every row in the webpages table corresponds to a single webpage. The row key is a concatenation of the URL components in reversed order and every column key is composed of the column family name and a column qualifier, separated by a colon. There are two column families: the “contents” column family with only one column holding the actual webpage and the “anchor” column family holding links to each webpage, each in a separate column. Every cell in the table (i.e. every value accessible by the combination of row and column key) can be versioned by timestamps or version numbers. It is important to note that much of the information of an entity lies in the keys and not only in the values .

2.2 Consistency-Availability Trade-Offs: CAP and PACELC

Another defining property of a database apart from how the data are stored and how they can be accessed is the level of consistency that is provided. Some databases are built to guarantee strong consistency and serializability (ACID), while others favour availability (BASE). This trade-off is inherent to every distributed database system and the huge number of different NoSQL systems shows that there is a wide spectrum between the two paradigms. In the following, we explain the two theorems CAP and PACELC according to which database systems can be categorised by their respective positions in this spectrum.

CAP. Like the famous FLP Theorem, the CAP Theorem, presented by Eric Brewer at PODC 2000 and later proven by Gilbert and Lynch, is one of the truly influential impossibility results in the field of distributed computing, because it places an ultimate upper bound on what can possibly be accomplished by a distributed system. It states that a sequentially consistent read/write register that eventually responds to every request cannot be realised in an asynchronous system that is prone to network partitions. In other words, it can guarantee at most two of the following three properties at the same time:

  • Consistency (C): Reads and writes are always executed atomically and are strictly consistent (linearizable). Put differently, all clients have the same view on the data at all times.
  • Availability (A): Every non-failing node in the system can always accept read and write requests by clients and will eventually return with a meaningful response, i.e. not with an error message.
  • Partition-tolerance (P): The system upholds the previously displayed consistency guarantees and availability in the presence of message loss between the nodes or partial system failure.

Brewer argues that a system can be both available and consistent in normal operation, but in the presence of a system partition, this is not possible: If the system continues to work in spite of the partition, there is some non-failing node that has lost contact to the other nodes and thus has to decide to either continue processing client requests to preserve availability (AP, eventual consistent systems) or to reject client requests in order to uphold consistency guarantees (CP). The first option violates consistency, because it might lead to stale reads and conflicting writes, while the second option obviously sacrifices availability. There are also systems that usually are available and consistent, but fail completely when there is a partition (CA), for example single-node systems. It has been shown that the CAP-theorem holds for any consistency property that is at least as strong as causal consistency, which also includes any recency bounds on the permissible staleness of data (Δ-atomicity). Serializability as the correctness criterion of transactional isolation does not require strong consistency. However, similar to consistency, serializability can also not be achieved under network partitions.

The classification of NoSQL systems as either AP, CP or CA vaguely reflects the individual systems’ capabilities and hence is widely accepted as a means for high-level comparisons. However, it is important to note that the CAP Theorem actually does not state anything on normal operation; it merely tells us whether a system favors availability or consistency in the face of a network partition. In contrast to the FLP-Theorem, the CAP theorem assumes a failure model that allows arbitrary messages to be dropped, reordered or delayed indefinitely. Under the weaker assumption of reliable communication channels (i.e. messages always arrive but asynchronously and possibly reordered) a CAP-system is in fact possible using the Attiya, Bar-Noy, Dolev algorithm, as long as a majority of nodes are up. (Therefore, consensus as used for coordination in many NoSQL systems either natively (e.g. in Megastore) or through coordination services like Chubby and Zookeeper is even harder to achieve with high availability than strong consistency, see FLP Theorem.)

PACELC. This lack of the CAP Theorem is addressed in an article by Daniel Abadi in which he points out that the CAP Theorem fails to capture the trade-off between latency and consistency during normal operation, even though it has proven to be much more influential on the design of distributed systems than the availability-consistency trade-off in failure scenarios. He formulates PACELC which unifies both trade-offs and thus portrays the design space of distributed systems more accurately. From PACELC, we learn that in case of a Partition, there is an Availability-Consistency trade-off; Else, i.e. in normal operation, there is a Latency-Consistency trade-off.

This classification basically offers two possible choices for the partition scenario (A/C) and also two for normal operation (L/C) and thus appears more fine-grained than the CAP classification. However, many systems cannot be assigned exclusively to one single PACELC class and one of the four PACELC classes, namely PC/EL, can hardly be assigned to any system.

3. Techniques

Every significantly successful database is designed for a particular class of applications, or to achieve a specific combination of desirable system properties. The simple reason why there are so many different database systems is that it is not possible for any system to achieve all desirable properties at once. Traditional SQL databases such as PostgreSQL have been built to provide the full functional package: a very flexible data model, sophisticated querying capabilities including joins, global integrity constraints and transactional guarantees. On the other end of the design spectrum, there are key-value stores like Dynamo that scale with data and request volume and offer high read and write throughput as well as low latency, but barely any functionality apart from simple lookups.

In this section, we highlight the design space of distributed database systems, concentrating on sharding, replication, storage management and query processing. We survey the available techniques and discuss how they are related to different functional and non-functional properties (goals) of data management systems. In order to illustrate what techniques are suitable to achieve which system properties, we provide the NoSQL Toolbox (Figure 4) where each technique is connected to the functional and non-functional properties it enables (positive edges only).

3.1 Sharding

Several distributed relational database systems such as Oracle RAC or IBM DB2 pureScale rely on a shared-disk architecture where all database nodes access the same central data repository (e.g. a NAS or SAN). Thus, these systems provide consistent data at all times, but are also inherently difficult to scale. In contrast, the (NoSQL) database systems focused in this paper are built upon a shared-nothing architecture, meaning each system consists of many servers with private memory and private disks that are connected through a network. Thus, high scalability in throughput and data volume is achieved by sharding (partitioning) data across different nodes (shards) in the system. There are three basic distribution techniques: range-sharding, hash-sharding and entity-group sharding. To make efficient scans possible, the data can be partitioned into ordered and contiguous value ranges by range-sharding. However, this approach requires some coordination through a master that manages assignments. To ensure elasticity, the system has to be able to detect and resolve hotspots automatically by further splitting an overburdened shard.

Range sharding is supported by wide-column stores like BigTable, HBase or Hypertable and document stores, e.g. MongoDB, RethinkDB, Espresso and DocumentDB. Another way to partition data over several machines is hash-sharding where every data item is assigned to a shard server according to some hash value built from the primary key. This approach does not require a coordinator and also guarantees the data to be evenly distributed across the shards, as long as the used hash function produces an even distribution. The obvious disadvantage, though, is that it only allows lookups and makes scans unfeasible. Hash sharding is used in key-value stores and is also available in some wide-coloumn stores like Cassandra or Azure Tables.

The shard server that is responsible for a record can be determined as serverid = hash(id)%servers, for example. However, this hashing scheme requires all records to be reassigned every time a new server joins or leaves, because it changes with the number of shard servers (servers). Consequently, it infeasible to use in elastic systems like Dynamo, Riak or Cassandra, which allow additional resources to be added on-demand and again be removed when dispensable. For increased flexibility, elastic systems typically use consistent hashing where records are not directly assigned to servers, but instead to logical partitions which are then distributed across all shard servers. Thus, only a fraction of the data have to be reassigned upon changes in the system topology. For example, an elastic system can be downsized by offloading all logical partitions residing on a particular server to other servers and then shutting down the now idle machine. For details on how consistent hashing is used in NoSQL systems, see the Dynamo paper.

Entity-group sharding is a data partitioning scheme with the goal of enabling single-partition transactions on co-located data. The partitions are called entity-groups and either explicitly declared by the application (e.g. in G-Store andMegaStore) or derived from transactions’ access patterns (e.g. in Relational Cloud and Cloud SQL Server). If a transaction accesses data that spans more than one group, data ownership can be transferred between entity-groups or the transaction manager has to fallback to more expensive multi-node transaction protocols.

3.2 Replication

In terms of CAP, conventional RDBMSs are often CA systems run in single-server mode: The entire system becomes unavailable on machine failure. And so system operators secure data integrity and availability through expensive, but reliable high-end hardware. In contrast, NoSQL systems like Dynamo, BigTable or Cassandra are designed for data and request volumes that cannot possibly be handled by one single machine, and therefore they run on clusters consisting of thousands of servers. (Low-end hardware is used, because it is substantially more cost-efficient than high-end hardware.) Since failures are inevitable and will occur frequently in any large-scale distributed system, the software has to cope with them on a daily basis . In 2009, Google fellow Jeff Dean stated that a typical new cluster at Google encounters thousands of hard drive failures, 1,000 single-machine failures, 20 rack failures and several network partitions due to expected and unexpected circumstances in its first year alone. Many more recent cases of network partitions and outages in large cloud data centers have been reported . Replication allows the system to maintain availability and durability in the face of such errors. But storing the same records on different machines (replica servers) in the cluster introduces the problem of synchronization between them and thus a trade-off between consistency on the one hand and latency and availability on the other.

Gray et al. propose a two-tier classification of different replication strategies according to when updates are propagated to replicas and where updates are accepted. There are two possible choices on tier one (“when”): Eager(synchronous) replication propagates incoming changes synchronously to all replicas before a commit can be returned to the client, whereas lazy (asynchronous) replication applies changes only at the receiving replica and passes them on asynchronously. The great advantage of eager replication is consistency among replicas, but it comes at the cost of higher write latency due to the need to wait for other replicas and impaired availability. Lazy replication is faster, because it allows replicas to diverge; as a consequence, stale data might be served. On the second tier (“where”), again, two different approaches are possible: Either a master-slave (primary copy) scheme is pursued where changes can only be accepted by one replica (the master) or, in a update anywhere (multi-master) approach, every replica can accept writes. In master-slave protocols, concurrency control is not more complex than in a distributed system without replicas, but the entire replica set becomes unavailable, as soon as the master fails. Multi-master protocols require complex mechanisms for prevention or detection and reconciliation of conflicting changes. Techniques typically used for these purposes are versioning, vector clocks, gossiping and read repair (e.g. in Dynamo) and convergent or commutative datatypes (e.g. in Riak).

Basically, all four combinations of the two-tier classification are possible. Distributed relational systems usually perform eager master-slave replication to maintain strong consistency. Eager update anywhere replication as for example featured in Google’s Megastore suffers from a heavy communication overhead generated by synchronisation and can cause distributed deadlocks which are expensive to detect. NoSQL database systems typically rely on lazyreplication, either in combination with the master-slave (CP systems, e.g. HBase and MongoDB) or the update anywhere approach (AP systems, e.g. Dynamo and Cassandra). Many NoSQL systems leave the choice between latency and consistency to the client, i.e. for every request, the client decides whether to wait for a response from any replica to achieve minimal latency or for a certainly consistent response (by a majority of the replicas or the master) to prevent stale data.

An aspect of replication that is not covered by the two-tier scheme is the distance between replicas. The obvious advantage of placing replicas near one another is low latency, but close proximity of replicas might also reduce the positive effects on availability; for example, if two replicas of the the same data item are placed in the same rack, the data item is not available on rack failure in spite of replication. But more than the possibility of mere temporary unavailability, placing replicas nearby also bears the peril of losing all copies at once in a disaster scenario. An alternative technique for latency reduction is used in Orestes, where data is cached close to applications using web caching infrastructure and cache coherence protocols.

Geo-replication can protect the system against complete data loss and improve read latency for distributed access from clients. Eager geo-replication, as implemented in Megastore, Spanner, MDCC and Mencius achieve strong consistency at the cost of higher write latencies (typically 100ms to 600ms). With lazy geo-replication as in Dynamo, PNUTS, Walter, COPS, Cassandra and BigTable recent changes may be lost, but the system performs better and remains available during partitions. Charron-Bost et al. (Chapter 12) and Öszu and Valduriez (Chapter 13) provide a comprehensive discussion of database replication.

3.3 Storage Management

For best performance, database systems need to be optimized for the storage media they employ to serve and persist data. These are typically main memory (RAM), solid-state drives (SSDs) and spinning disk drives (HDDs) that can be used in any combination. Unlike RDBMSs in enterprise setups, distributed NoSQL databases avoid specialized shared-disk architectures in favor of shared-nothing clusters based on commodity servers (employing commodity storage media). Storage devices are typically visualized as a “storage pyramid” (see Figure 5 or Hellerstein et al.). There is also a set of transparent caches (e.g. L1-L3 CPU caches and disk buffers, not shown in the Figure), that are only implicitly leveraged through well-engineered database algorithms that promote data locality. The very different cost and performance characteristics of RAM, SSD and HDD storage and the different strategies to leverage their strengths (storage management) are one reason for the diversity of NoSQL databases. Storage management has a spatial dimension (where to store data) and a temporal dimension (when to store data). Update-in-place and append-only-IO are two complementary spatial techniques of organizing data; in-memory prescribes RAM as the location of data, whereas logging is a temporal technique that decouples main memory and persistent storage and thus provides control over when data is actually persisted.

In their seminal paper “the end of an architectural era”, Stonebraker et al. have found that in typical RDBMSs, only 6.8% of the execution time is spent on “useful work”, while the rest is spent on:

  • buffer management (34.6%), i.e. caching to mitigate slower disk access
  • latching (14.2%), to protect shared data structures from race conditions caused by multi-threading
  • locking (16.3%), to guarantee logical isolation of transactions
  • logging (1.9%), to ensure durability in the face of failures
  • hand-coded optimizations (16.2%)

This motivates that large performance improvements can be expected if RAM is used as primary storage (in-memory databases). The downside are high storage costs and lack of durability — a small power outage can destroy the database state. This can be solved in two ways: The state can be replicated over n in-memory server nodes protecting against n-1 single-node failures (e.g. HStore, VoltDB) or by logging to durable storage (e.g. Redis or SAP Hana). Through logging, a random write access pattern can be transformed to a sequential one comprised of received operations and their associated properties (e.g. redo information). In most NoSQL systems, the commit rule for logging is respected, which demands every write operation that is confirmed as successful to be logged and the log to be flushed to persistent storage. In order to avoid the rotational latency of HDDs incurred by logging each operation individually, log flushes can be batched together (group commit) which slightly increases the latency of individual writes, but drastically improves throughput.

SSDs and more generally all storage devices based on NAND flash memory differ substantially from HDDs in various aspects: “(1) asymmetric speed of read and write operations, (2) no in-place overwrite — the whole block must be erased before overwriting any page in that block, and (3) limited program/erase cycles” (Min et al., 2012). Thus, a database system’s storage management must not treat SSDs and HDDs as slightly slower, persistent RAM, since random writes to an SSD are roughly an order of magnitude slower than sequential writes. Random reads, on the other hand, can be performed without any performance penalties. There are some database systems (e.g. Oracle Exadata, Aerospike) that are explicitly engineered for these performance characteristics of SSDs. In HDDs, both random reads and writes are 10–100 times slower than sequential access. Logging hence suits the strengths of SSDs and HDDs which both offer a significantly higher throughput for sequential writes.

For in-memory databases, an update-in-place access pattern is ideal: It simplifies the implementation and random writes to RAM are essentially equally fast as sequential ones, with small differences being hidden by pipelining and the CPU-cache hierarchy. However, RDBMSs and many NoSQL systems (e.g. MongoDB) employ an update-in-place update pattern for persistent storage, too. To mitigate the slow random access to persistent storage, main memory is usually used as a cache and complemented by logging to guarantee durability. In RDBMSs, this is achieved through a complex buffer pool which not only employs cache-replace algorithms appropriate for typical SQL-based access patterns, but also ensures ACID semantics. NoSQL databases have simpler buffer pools that profit from simpler queries and the lack of ACID transactions. The alternative to the buffer pool model is to leave caching to the OS through virtual memory (e.g. employed in MongoDB’s MMAP storage engine). This simplifies the database architecture, but has the downside of giving less control over which data items or pages reside in memory and when they get evicted. Also read-ahead (speculative reads) and write-behind (write buffering) transparently performed with OS buffering lack sophistication as they are based on file system logics instead of database queries.

Append-only storage (also referred to as log-structuring) tries to maximize throughput by writing sequentially. Although log-structured file systems have a long research history, append-only I/O has only recently been popularized for databases by BigTable’s use of Log-Structured Merge (LSM) trees consisting of an in-memory cache, a persistent log and immutable, periodically written storage files. LSM trees and variants like Sorted Array Merge Trees (SAMT) and Cache-Oblivious Look-ahead Arrays (COLA) have been applied in many NoSQL systems (Cassandra, CouchDB, LevelDB, Bitcask, RethinkDB, WiredTiger, RocksDB, InfluxDB, TokuDB). Designing a database to achieve maximum write performance by always writing to a log is rather simple, the difficulty lies in providing fast random and sequential reads. This requires an appropriate index structure that is either permanently updated as a copy-on-write (COW) data structure (e.g. CouchDB’s COW B-trees) or only periodically persisted as an immutable data structure (e.g. in
BigTable-style systems). An issue of all log-structured storage approaches is costly garbage collection (compaction) to reclaim space of updated or deleted items.

In virtualized environments like Infrastructure-as-a-Service clouds many of the discussed characteristics of the underlying storage layer are hidden.

3.4 Query Processing

The querying capabilities of a NoSQL database mainly follow from its distribution model, consistency guarantees and data model. Primary key lookup, i.e. retrieving data items by a unique ID, is supported by every NoSQL system, since it is compatible to range- as well as hash-partitioning. Filter queries return all items (or projections) that meet a predicate specified over the properties of data items from a single table. In their simplest form, they can be performed as filtered full-table scans. For hash-partitioned databases this implies a scatter-gather pattern where each partition performs the predicated scan and results are merged. For range-partitioned systems, any conditions on the range attribute can be exploited to select partitions.

To circumvent the inefficiencies of O(n) scans, secondary indexes can be employed. These can either be local secondary indexes that are managed in each partition or global secondary indexes that index data over all partitions. As the global index itself has to be distributed over partitions, consistent secondary index maintenance would necessitate slow and potentially unavailable commit protocols. Therefore in practice, most systems only offer eventual consistency for these indexes (e.g. Megastore, Google AppEngine Datastore, DynamoDB) or do not support them at all (e.g. HBase, Azure Tables). When executing global queries over local secondary indexes the query can only be targeted to a subset of partitions if the query predicate and the partitioning rules intersect. Otherwise, results have to be assembled through scatter-gather. For example, a user table with range-partitioning over an age field can service queries that have an equality condition on age from one partition whereas queries over names need to be evaluated at each partition. A special case of global secondary indexing is full-text search, where selected fields or complete data items are fed into either a database-internal inverted index (e.g. MongoDB) or to an external search platform such as ElasticSearch or Solr (Riak Search, DataStax Cassandra).

Query planning is the task of optimizing a query plan to minimize execution costs. For aggregations and joins, query planning is essential as these queries are very inefficient and hard to implement in application code. The wealth of literature and results on relational query processing is largely disregarded in current NoSQL systems for two reasons. First, the key-value and wide-column model are centered around CRUD and scan operations on primary keys which leave little room for query optimization. Second, most work on distributed query processing focuses on OLAP (online analytical processing) workloads that favor throughput over latency whereas single-node query optimization is not easily applicable for partitioned and replicated databases. However, it remains an open research challenge to generalize the large body of applicable query optimization techniques especially in the context of document databases. (Currently only RethinkDB can perform general Θ-joins. MongoDB’s aggregation framework has support for left-outer equi-joins in its aggregation framework and CouchDB allows joins for pre-declared map-reduce views.)

In-database analytics can be performed either natively (e.g. in MongoDB, Riak, CouchDB) or through external analytics platforms such as Hadoop, Spark and Flink (e.g. in Cassandra and HBase). The prevalent native batch analytics abstraction exposed by NoSQL systems is MapReduce. (An alternative to MapReduce are generalized data processing pipelines, where the database tries to optimize the flow of data and locality of computation based on a more declarative query language, e.g. MongoDB’s aggregation framework.) Due to I/O, communication overhead and limited execution plan optimization, these batch- and micro-batch-oriented approaches have high response times.Materialized views are an alternative with lower query response times. They are declared at design time and continuously updated on change operations (e.g. in CouchDB and Cassandra). However, similar to global secondary indexing, view consistency is usually relaxed in favor of fast, highly-available writes, when the system is distributed. As only few database systems come with built-in support for ingesting and querying unbounded streams of data,near-real-time analytics pipelines commonly implement either the Lambda Architecture or the Kappa Architecture: The former complements a batch processing framework like Hadoop MapReduce with a stream processor such as Storm (see for example Summingbird) and the latter exclusively relies on stream processing and forgoes batch processing altogether.

4. System Case Studies

In this section, we provide a qualitative comparison of some of the most prominent key-value, document and wide-column stores. We present the results in strongly condensed comparisons and refer to the documentations of the individual systems for in-detail information. The proposed NoSQL Toolbox (see Figure 4) is a means of abstraction that can be used to classify database systems along three dimensions: functional requirements, non-functional requirements and the techniques used to implement them. We argue that this classification characterizes many database systems well and thus can be used to meaningfully contrast different database systems: Table 1 shows a direct comparison of MongoDB, Redis, HBase, Riak, Cassandra and MySQL in their respective default configurations. A more verbose comparison of central system properties is presented in the large comparison Table 2 at the end of this article.

The methodology used to identify the specific system properties consists of an in-depth analysis of publicly available documentation and literature on the systems. Furthermore, some properties had to be evaluated by researching the open-source code bases, personal communication with the developers as well as a meta-analysis of reports and benchmarks by practitioners.

For detailed descriptions see the slides from our ICDE 2016 Tutorial, which goes over many details of the different NoSQL systems:

The comparison elucidates how SQL and NoSQL databases are designed to fulfill very different needs: RDBMSs provide an unmatched level of functionality whereas NoSQL databases excel on the non-functional side through scalability, availability, low latency and/or high throughput. However, there are also large differences among the NoSQL databases. Riak and Cassandra, for example, can be configured to fulfill many non-functional requirements, but are only eventually consistent and do not feature many functional capabilities apart from data analytics and, in case of Cassandra, conditional updates. MongoDB and HBase, on the other hand, offer stronger consistency and more sophisticated functional capabilities such as scan queries and — only MongoDB: — filter queries, but do not maintain read and write availability during partitions and tend to display higher read latencies. Redis, as the only non-partitioned system in this comparison apart from MySQL, shows a special set of trade-offs centered around the ability to maintain extremely high throughput at low-latency using in-memory data structures and asynchronous master-slave replication.

5. Conclusions

Choosing a database system always means to choose one set of desirable properties over another. To break down the complexity of this choice, we present a binary decision tree in Figure 6 that maps trade-off decisions to example applications and potentially suitable database systems. The leaf nodes cover applications ranging from simple caching (left) to Big Data analytics (right). Naturally, this view on the problem space is not complete, but it vaguely points towards a solution for a particular data management problem.

The first split in the tree is along the access pattern of applications: They either rely on fast lookups only (left half) or require more complex querying capabilities (right half). The fast lookup applications can be distinguished further by the data volume they process: If the main memory of one single machine can hold all the data, a single-node system like Redis or Memcache probably is the best choice, depending on whether functionality (Redis) or simplicity (Memcache) is favored. If the data volume is or might grow beyond RAM capacity or is even unbounded, a multi-node system that scales horizontally might be more appropriate. The most important decision in this case is whether to favor availability (AP) or consistency (CP) as described by the CAP theorem. Systems like Cassandra and Riak can deliver an always-on experience, while systems like HBase, MongoDB and DynamoDB deliver strong consistency.

The right half of the tree covers applications requiring more complex queries than simple lookups. Here, too, we first distinguish the systems by the data volume they have to handle according to whether single-node systems are feasible (HDD-size) or distribution is required (unbounded volume). For common OLTP (online transaction processing) workloads on moderately large data volumes, traditional RDBMSs or graph databases like Neo4J are optimal, because they offer ACID semantics. If, however, availability is of the essence, distributed systems like MongoDB, CouchDB or DocumentDB are preferrable.

If the data volume exceeds the limits of a single machine, the choice of the right system depends on the prevalent query pattern: When complex queries have to be optimised for latency, as for example in social networking applications, MongoDB is very attractive, because it facilitates expressive ad-hoc queries. HBase and Cassandra are also useful in such a scenario, but excel at throughput-optimised Big Data analytics, when combined with Hadoop.

In summary, we are convinced that the proposed top-down model is an effective decision support to filter the vast amount of NoSQL database systems based on central requirements. The NoSQL Toolbox furthermore provides a mapping from functional and non-functional requirements to common implementation techniques to categorize the constantly evolving NoSQL space.

Don’t want to miss our next post on NoSQL topics? Get it conveniently delivered to your inbox by joining our newsletter.
If you want to cite this article, please use this DBLP reference:

author = {Felix Gessert and
Wolfram Wingerath and
Steffen Friedrich and
Norbert Ritter},
title = {NoSQL database systems: a survey and decision guidance},
journal = {Computer Science – R{\&}D},
volume = {32},
number = {3-4},
pages = {353–365},
year = {2017},
doi = {10.1007/s00450-016-0334-3}

{ Add a Comment }

Introduction to NoSQL Databases

sql vs nosql

Over the past few years large tech companies such as Netflix, Hulu, Uber and Facebook and have brought notoriety to NoSQL or non-relational databases. In this blog post I will give a brief introduction to the types of NoSQL databases and how they compare to traditional SQL databases.

Relational vs Non-relational

The key difference between a NoSQL and SQL is that a SQL database is considered a relational database. A relational database stores data in tables, which are organized into columns. Each column stores one datatype (integer, real number, string, date etc.) and each row represents an instance of the table. Non-relational databases do not store data in tables- instead there are multiple ways to store data in NoSQL databases (Key-value, Document-based, Column-based). I will briefly summarize these 3 below.

Key-value Store (Oracle NoSQL, Redis, Amazon Dynamo)

Key-Value databases are the most simple of all the NoSQL databases. The basic data structure is a dictionary or map. A value can be stored as an integer, string, JSON, or an array- with a key used to reference that value. For example you could have a key as a customer id, which referred to a value containing a string of the customer’s name. Using a JSON structure would add complexity to the value as it could contain a dictionary of information about the the customer with the corresponding key. Benefits include rapid storage of data (due to the high simplicity), has integrated caching feature allowing users to store/retrieve data as quickly as possible and they are highly suited for unrelated data. For example a user’s web app activity will be different and unrelated to another user’s activity. However, key-value databases make it difficult to perform advanced queries aside from basic CRUD (create, read, update, delete) operations. Additionally, as volume of data increases maintaining unique keys becomes more difficult

Document-Based Store (MongoDB and Couchbase)

The next type of NoSQL database is the Document-oriented database. Data is structured in the form of documents and collections. A document can be a PDF, Microsoft word doc, XML or JSON file. As opposed to columns and datatypes a document contains key value pairs. Each document does not have to be in the same structure as other documents. Due to this, to add additional data one can simply add more documents without having to change the structure of the entire database. Documents are grouped into collections, which serve a similar purpose to a relational table. Document databases provide a querying function to search collections of documents with particular attributes. Benefits include flexible data modeling (eliminates the need to force fit relational data models, as it can handle structured, unstructured and semi-structured data) and fast-write performance over strict consistency (great for Agile and quick iteration). Document based databases also allow for separation of collections by entity (orders and customer profiles). However, they are also limited with advanced queries and does not allow for joins.

Column-based Store (Google’s Bigtable, Cassandra, HBase)

Column-Oriented databases store data in grouped columns rather than in rows of data. They use a concept called keyspace, which is similar to the schema in a relational model. This is shown to the left. The keyspace contains multiple column families. Column families are similar to tables in a relational model. However, instead of containing just rows, a column family contains rows of columns. Each row in a column family has a unique key and each column within the row contains a name, value and time stamp (shown below).

The benefits of a column based database include data compression, high performance with aggregate functions (SUM, COUNT, etc) and scalability across a ton of machines. However, writing new data to columnar databases could take more time. You could write new data to a row based database in one operation, but for a Columnar database you would need to write each column one by one. Therefore, Columnar databases are better for processing data with a small number of columns and larger number of rows.

Pros and Cons of NoSQL


  1. Lack of standardization, Support and Maturity: as the title indicates these non-relational databases do not use the Structured Query Language and does not have it’s own standardized language. This makes it difficult to migrate processes over to NoSQL. To left is a snippet of the difference between SQL and NoSQL CRUD operations. As you can see as the operations get a little more complex the nested JSON format can get confusing. This also goes hand in hand with the lack of maturity of NoSQL, as there are much less NoSQL experts than SQL at the moment. Due to this same lack of maturity there is not a ton of support for NoSQL- most is done on a community support basis.
  2. Analytics and BI: SQL has a wide array of tools for Business intelligence (some found here). However, there are not many tools for NoSQL analytics at the moment.


  1. Flexibility: Easier to manage and more adept at dealing with newer data models. Therefore, NoSQL can fit very specific needs of a company. This article explains how Netflix adopted SimpleDB, Cassandra and HBase for specific business needs way back in 2011.
  2. Highly Scalable at Low Cost: Many NoSQL options are open source making them a more affordable option for smaller organizations. In addition to this the top NoSQL options (MongoDB, Amazon Dynamo) allow for big data processing at a relatively affordable price. SQL relies on proprietary servers and storage systems which can end up being more expensive than the per GB or transaction cost of NoSQL. Additionally, NoSQL databases can scale out as opposed to scaling up. In the past database administrators had relied on scaling up by buying bigger servers to handle more data loads. NoSQL can scale out- meaning the database will be distributed across multiple servers as load increases. Therefore, NoSQL databases are usually designed with low-cost commodity hardware in mind.

Additional Notes:

  1. Although there are many benefits to NoSQL databases, SQL databases are still more widely used at this point. The image to the left shows the current database rankings per DB-Engines.
  2. SQL databases emphasize ACID (Atomicity, Consistency, Isolation, Durability) where NoSQL focuses on BASE (Basically Available, Soft-State, Eventual Consistency). These are both based on Brewer’s CAP thereoem (visualized below). A common definition for CAP is “In the face of network partitions, you can’t always have both perfect consistency and 100% availability. Plan accordingly.” For more information on BASE, ACID and CAP please see the links below.

{ Add a Comment }

MongoDB World 2018 June 26/27 NY!

MongoDB World is an educational tech conference coming to NYC on June 26-27.

Join us for hands-on technical sessions taught by the team that builds the database. Sessions will cover topics including analytics, app modernization, performance, security, and more. You’ll walk away with immediately applicable MongoDB skills. Register here:

Hope to see you there!

{ Add a Comment }

Welcome ReasonDB!


100% JavaScript automatically synchronizing multi-model database with a SQL like syntax (JOQULAR) and swapable persistence stores. It supports joins, nested matches, projections or live object result sets, asynchronous cursors, streaming analytics, 18 built-in predicates, in-line predicates, predicate extensibility, indexable computed values, fully indexed Dates and Arrays, built in statistical sampling. Persistence engines include files, Redis, LocalStorage, block storage, and more.

Good luck!

{ Add a Comment }

The inaugural “Database Discussion” round table

Please have a look: The inaugural “Database Discussion” round table.

Each month we’ll post a thought-provoking question to several of our Toad World experts. Below are their thoughts, but, we’d like your opinions, too! Join the conversation.

And, if you have any questions on the subject from your experience please post them to be answered by the expert panel of contributors.

We’ll even entice you to share Smile.

Each month we’ll randomly select one response to win a $25 Amazon gift card.

Speak up – don’t be silent!

Follow this link for more information!

The inaugural “Database Discussion” round table.

{ Add a Comment }