A customer asked me the other day how to choose the right No SQL Database? Unfortunately this is not an easy task.
There are over 150 different offerings and there are significant differences between them.
The best advice I can give is “Choose the database that matches best your problem”.
No SQL Databases can be categorized into four major groups.
- Key Value databases
- Wide column (column-family stores) database
- Document databases
- Graph Databases
In a key value database information is organized as key value pairs. A pure key-value database doesn’t understand what’s stored in the value and limits developers to a simple interface of SETS and GETS, yet they provide good scalability, high performance,high availability and reliability at scale.
Examples: Redis, Riak,Voldemort.
- Simple Data Model
· You have to create your own "foreign key"
· Poor for complex Data
Wide column (column-family stores) Database
In a wide column database information is organized in a group of key value pairs (columns) but it allows you to nest key-value pairs, so a key could refer to multiple sub-key-value pairs. The data model is a big table with column families and the query model is based on a map-reduce pattern.
Examples: HBase, HyperTable, Cassandra
- Support semi-structured data
- Naturally indexed (columns)
· Poor interconnect data
A document database is, at its core, a key/value store with one major exception. Instead of just storing any blob in it, a document db requires that the data will be store in a format that the database can understand. The format can be XML, JSON, Binary JSON, or just about anything, as long as the database can understand it. To query databases use map reduce powerful indexing capabilities.
Examples: RavenDB, MongoDB, CouchDB
- Powerful data model
· Poor interconnect data
· Query model limited to keys and indexes
Information is organized as a graph of nodes and relationships. A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way.
Examples: Neo4j, OrientDB, InfinitGraph, AllegroGraph
- Powerful data model as general as RDBMS
- Connected data is locally indexed
- Easy to author powerful queries
· Not trivial
· Complex sharding and scalability
With these definition in mind you can look at your problem and find the best family of databases to choose from, still there are so many offerings…
In this table you can find a very well organized comparison of the top industry database out there. I also recommend to have a look at this complete list of databases organized according to the categories I just described.
To perform a benchmark and compare NoSQL databases is not trivial. The problem is that databases (even from the same category) are so different in their implementation that it is difficult to conduct a fair test. To help you with that YCSB was developed.
YCSB (Yahoo Cloud Serving Benchmark) is a popular tool for evaluating the performance of different key-value and cloud serving stores.
You can find a nice benchmark here . I recommend you read it before you start your own.
So the conclusion is that there is no simple answer. Only after you understand your problem you can look at the above list, comparison and benchmarks and find the database that best suits you.
Hope this helps.