The DB jungle guide: "How to select the right database"
This list is compiled from 2 years of NoSQL consulting and has been presented on many conferences (video here
articles (e.g. here) and in the worlds first NoSQL Books (in german).
Cluster 1: Know & Segment your data
Analyze & Categorize it:
- critical Data
- temp Data
- Geo Data
Data- / Storage Model:
- etc. (beyond bit-bucket)
Data / Type constraints:
- Data Amount?
- Data Komplexity (Deep XML?)
- Schema flexibility?
- Schema support needed?
(Reference: (C) highscalability link to be inserted)
- Durability? On power failure?
- Memtable/SSTable; Apend-only B-tree; B-tree; On-disk linked lists;
In-memory replicated; In-memory snapshots; In-memory only; Hash; Pluggable.
Cluster 2: Consistency Model
Global consistency model:
- ACID / BASE / WATER?
- Ability to (fine) tune the consistency model
Cluster 3: Performance Dimensions
- Latency / Request behaviour / distribution [High = 10, Low = 0]
- Throughput [High = 10, Low = 0]
- High Concurrency?
Cluster 4: Query Requirements
- Typical queries look like?
- SQL needed? LINQ needed?
- BI / Analytic-Tools needed? (M/R sufficient?)
- Ad-Hoc Queries needed?
- Map/Reduce needed? Background data analytics?
- Secondary Indices
- Range queries
- Weird aggregations
- ColumnDB needed for Analytics?
Cluster 5: Architecture and Patterns
Architecture looks like:
- local, parallel, distributed / grid, service, cloud, mobile, p2p, …
- Hosted? Cloud? Local? Datacenter?
Data Access Patterns
- read / write distribution?
- random / sequential access?
- Access Design Patterns
Cluster 6: Non functional Requirements
- Replication needed? = Rubustness
- Automatic load balancing, partitioning, and repartitioning?
- Text search integration? Lucene / Solr?
- Refactoring Frequency?
- 24/7 System? Live add and remove?
- Developer Qualification
- DB simplicity? (installation, configuration, development, deployment, upgrade)
- Company restrictions?
- DB diversity (allowed?)
- Security? (authentication, authorization, validation?)
- Licence Model?
- Vendor trustworthiness?
- Community support?
- Company and DB dev in the future?
- DB-Support? (responsiveness, SLA)
- Costs in general, Scaling Costs
- Sysadmin costs
- Operational Costs: (noOps)
- Safety / Backup & Restore
- Crash Resistance, Disaster Management