Vector Databases & AI Search

Vector databases store embeddings — high-dimensional numerical representations of text, images, or audio produced by machine learning models. Instead of exact keyword matching, vector search finds results that are semantically similar: searching for "heart attack" can return results about "myocardial infarction" because their embeddings are close in vector space. The core operation is Approximate Nearest Neighbor (ANN) search. You embed the query using the same model that produced the stored embeddings, then find the K vectors in the database closest to the query vector. Pinecone, Weaviate, Qdrant, and pgvector (a Postgres extension) are the most common choices. pgvector is the simplest path if you already run Postgres — add the extension and a vector column, no new infrastructure needed.

Before

Keyword search (misses synonyms)

SELECT * FROM documents
WHERE content ILIKE '%database performance%';
-- Misses: "query optimization", "slow queries", "index tuning"

After

Vector search (finds semantic matches)

-- pgvector: find 5 semantically similar documents
SELECT id, title, content,
       embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;
-- $1 = embedding of "database performance"
-- Finds: "query optimization", "slow queries", "index tuning"

Key Takeaway

Vector search finds meaning, not just matching characters — essential for AI-powered search and RAG pipelines.