DatabasesArchitectureAWSAzureGCP

How to Choose the Right Database

July 1, 202618 min read

Picking a database is not about finding the fastest engine — it is about matching storage to the shape of your data. Start with one question: is your data structured, unstructured, or semi-structured? That single decision narrows the field from hundreds of options to a handful of realistic candidates.

From there, factor in your cloud platform, query patterns, consistency requirements, and whether you need portability across vendors. A startup on AWS has different defaults than an enterprise running hybrid on-premise and Azure. This guide walks the full decision tree — from data type through category to specific databases on AWS, Azure, Google Cloud, and cloud-agnostic alternatives.

Decision tree: data type → category → database

Structured Data: Relational Databases

Relational databases remain the backbone of most business applications. They store data in tables with fixed schemas, enforce referential integrity, and support ACID transactions — the gold standard when correctness matters more than raw write speed.

Use relational databases when your domain maps cleanly to rows and columns: user accounts, orders, invoices, inventory, and financial records. Joins across normalized tables are a feature, not a problem. ORMs like Prisma, Entity Framework, and SQLAlchemy integrate seamlessly.

Relational — all platforms

AWS

RDS
Aurora

Azure

Azure SQL Database

Google Cloud

Cloud SQL
Cloud Spanner

Cloud Agnostic

PostgreSQL
MySQL
SQL Server
Oracle
CockroachDB

•RDS and Aurora on AWS cover 80% of relational workloads with managed backups and failover.
•Cloud Spanner is the choice when you need global consistency at Google-scale — but it comes with cost and complexity.
•PostgreSQL is the default cloud-agnostic pick: powerful, open-source, and supported everywhere.
•Avoid relational stores for unstructured blobs, high-velocity time-series, or graph traversals.

Takeaway: Choose relational when your schema is stable and transactions must be correct.

Structured Data: Columnar Stores

Columnar databases flip the storage model: instead of storing rows together, they store columns together. That makes aggregations — SUM, AVG, COUNT over millions of rows — dramatically faster and cheaper because the engine reads only the columns you query.

These are analytics engines, not application databases. Do not use them for OLTP workloads with frequent single-row updates. They shine in data warehouses, BI dashboards, log analytics, and batch reporting pipelines.

Columnar — analytics platforms

AWS

Redshift

Azure

Azure Synapse Analytics

Google Cloud

BigQuery

Cloud Agnostic

Snowflake
Databricks
Hive

•BigQuery is serverless — you pay per query, which suits variable analytics workloads.
•Snowflake and Databricks work across clouds, ideal for teams avoiding vendor lock-in.
•Redshift and Synapse require cluster sizing — plan capacity and cost upfront.
•Pair columnar stores with ETL pipelines (dbt, Airflow) to keep data fresh.

Takeaway: Use columnar for analytics at scale — never as your primary application database.

Unstructured Data: Blob & Object Storage

Not everything is a table row. Images, videos, PDFs, backups, ML model artifacts, and raw log files are unstructured blobs — binary objects accessed by key, not by SQL query.

Object storage is cheap, durable, and infinitely scalable. AWS S3 alone stores trillions of objects. Use it as the system of record for files, then reference object URLs or keys from your application database when you need metadata.

Blob / object storage

AWS

Azure

Blob Storage

Google Cloud

Cloud Storage

Cloud Agnostic

HDFS

•S3 tiers (Standard, Infrequent Access, Glacier) let you optimize cost by access frequency.
•Enable versioning and lifecycle policies to protect against accidental deletes.
•HDFS remains relevant in Hadoop/Spark on-premise environments.
•Never store large blobs inside PostgreSQL or MongoDB — use object storage instead.

Takeaway: Files belong in object storage. Keep only metadata in your database.

Unstructured Data: Text Search

Full-text search is a specialized problem. Users type keywords, expect relevance-ranked results in milliseconds, and want fuzzy matching, synonyms, and faceted filters. SQL LIKE queries cannot do this at scale.

Search engines build inverted indexes — mapping every word to the documents containing it. They power product catalogs, knowledge bases, log exploration, and site-wide search bars.

Text search engines

AWS

OpenSearch
CloudSearch

Azure

Azure AI Search

Google Cloud

Vertex AI Search

Cloud Agnostic

Elasticsearch
Solr
Elassandra

•OpenSearch (fork of Elasticsearch) is the default on AWS for self-managed search clusters.
•Azure AI Search (formerly Cognitive Search) adds AI enrichment — OCR, entity extraction, language detection.
•Elasticsearch is the most portable option with a massive ecosystem of plugins.
•Sync your primary database to the search index via CDC or event streams — do not dual-write manually.

Takeaway: When users search text, use a search engine — not SQL.

Semi-Structured: Key-Value & In-Memory

Semi-structured data does not fit rigid tables but still has identifiable keys. Session tokens, user preferences, feature flags, shopping carts, and rate-limit counters all map naturally to key-value access patterns.

In-memory layers sit in front of slower databases to absorb read-heavy traffic. A cache miss falls through to the origin store; a cache hit returns in sub-millisecond latency. Redis is the most widely deployed in-memory datastore on the planet.

Semi-structured branches

AWS

DynamoDB
ElastiCache

Azure

Cosmos DB
Azure Cache for Redis

Google Cloud

Bigtable
Memorystore

Cloud Agnostic

Redis
RocksDB
Memcached
Hazelcast
Ignite

•DynamoDB scales to any throughput with single-digit millisecond latency — but model your access patterns first.
•Cosmos DB offers multiple consistency levels and APIs (document, graph, table) in one service.
•Redis supports strings, hashes, lists, sets, sorted sets, and pub/sub — far more than a simple cache.
•Define TTLs on cached data to prevent stale reads and unbounded memory growth.

Takeaway: Key-value for fast lookups. In-memory for hot data that must be instant.

Semi-Structured: Wide Column & Document

When a single key is not enough, you need richer data models. Document databases store JSON-like records with flexible, evolving schemas — perfect for content management, user profiles, and mobile app backends. Wide-column stores distribute massive datasets across clusters with tunable consistency.

MongoDB is the most popular document database. Cassandra and ScyllaDB handle write-heavy workloads at Netflix scale. Pick based on whether your schema changes frequently (document) or your write volume is extreme (wide-column).

Semi-structured branches

AWS

DocumentDB
Keyspaces

Azure

Cosmos DB

Google Cloud

Firestore
Bigtable

Cloud Agnostic

MongoDB
Couchbase
Cassandra
HBase
ScyllaDB

•DocumentDB is AWS's MongoDB-compatible managed service — good for teams already on MongoDB.
•Firestore offers real-time sync for mobile and web apps with offline support.
•Cassandra's partition key design is critical — get it wrong and queries become full scans.
•Document stores trade join flexibility for horizontal scale and schema freedom.

Takeaway: Documents for flexible schemas. Wide-column for write-heavy distributed scale.

Semi-Structured: Graph, Time-Series, Ledger & Geospatial

Some problems have dedicated database engines that outperform general-purpose stores by orders of magnitude. Graph databases traverse relationships — friends-of-friends, fraud rings, supply chains. Time-series databases compress and query metrics efficiently. Ledgers provide immutable audit trails. Geospatial engines index latitude/longitude for location queries.

Trying to model a social graph in SQL with recursive CTEs works at small scale but collapses at production size. Likewise, storing IoT sensor readings in PostgreSQL without TimescaleDB extensions leads to painful query performance.

AWS

Neptune
Timestream

Azure

Cosmos DB
Azure SQL Ledger

Google Cloud

Bigtable
BigQuery

Cloud Agnostic

Neo4j
InfluxDB
TimescaleDB
PostGIS
Hyperledger Fabric
TigerGraph

•Neptune supports property graphs and RDF — choose based on your query language (Gremlin vs SPARQL).
•TimescaleDB extends PostgreSQL with time-series hypertables — a gentle migration path for teams on Postgres.
•For immutable audit ledgers on AWS, use Aurora with audit logging or Hyperledger Fabric — AWS QLDB was retired in July 2025.
•PostGIS adds geospatial indexing to PostgreSQL — ideal if you already run Postgres.
•For cryptographically verifiable audit trails across clouds, Hyperledger Fabric is the portable standard.

Takeaway: Use purpose-built engines for graphs, metrics, audit trails, and location data.

Final Thoughts

There is no universal best database — only the best fit for your data shape, access patterns, and platform. Walk the decision tree: structured, unstructured, or semi-structured first. Then narrow by category — relational, columnar, blob, search, key-value, document, graph, or specialized.

Finally, pick the managed service on your cloud or a portable open-source engine. Factor in operational cost, team expertise, and migration path. The right database is the one your team can run reliably in production — not the one with the most hype.