Vector databases have become the infrastructure topic that every engineering leader is getting asked about. Your team is building a RAG system, or a semantic search feature, or a recommendation engine — and someone on the architecture review says you need a vector database. Maybe you do. Maybe you don’t.

The problem is that most explanations of vector databases fall into two camps: marketing pages that treat them as magic, or academic papers that assume you already have a PhD in information retrieval. Neither is useful when you’re trying to make an infrastructure decision that your team will live with for the next three years.

Here’s the practical breakdown: what vector databases actually do, when you genuinely need one, how the major options compare, and where we see teams make expensive mistakes.

What Vector Databases Actually Do

Start with the core concept. When you pass text through an embedding model — Azure OpenAI’s text-embedding-3-large, for example — the model converts that text into a high-dimensional numerical vector. A 1,536-dimension array of floating point numbers. That vector captures the meaning of the text, not just its keywords.

Two pieces of text that mean similar things produce vectors that are close together in that high-dimensional space. “How do I reset my password?” and “I need to change my login credentials” have almost no keyword overlap, but their embedding vectors will be nearly identical. This is the fundamental insight behind semantic search.

A vector database is purpose-built to store millions or billions of these vectors and find the nearest neighbors to a query vector — fast. That “fast” part is the entire reason these databases exist.

Traditional relational databases can store vectors. PostgreSQL with pgvector can do it. But when you run a similarity search across 10 million vectors, you’re asking the database to calculate the distance between your query vector and every stored vector, then sort by distance. That’s a brute-force scan. It works at small scale. At enterprise scale, it takes seconds instead of milliseconds — and seconds kill user experience.

Vector databases solve this with specialized indexing algorithms. HNSW (Hierarchical Navigable Small World) graphs, IVF (Inverted File) indexes, and quantization techniques let them find approximate nearest neighbors without scanning every vector. The trade-off is a small accuracy loss for a massive speed gain. In practice, you get 95-99% recall at 10-100x the speed of brute force.

Think of it like a library analogy, but more honest. A keyword search is looking for books that contain the exact word “thermodynamics.” A vector search is asking a librarian who has read every book to find the ones about heat transfer in engineering systems, regardless of what words the authors used. The vector database is the indexing system that lets that librarian answer in milliseconds instead of walking every aisle.

When You Actually Need One

Not every AI project needs a dedicated vector database. We’ve seen teams over-engineer their infrastructure because a conference talk convinced them they needed Pinecone when PostgreSQL with pgvector would have been fine.

You need a dedicated vector database when:

You’re building a production RAG system over more than 100K documents with sub-second latency requirements
You’re running semantic search at scale — millions of records, hundreds of concurrent queries
You need hybrid search (vector + keyword + metadata filtering) with complex query patterns
Your recommendation or similarity matching system is latency-sensitive and high-throughput

You probably don’t need one when:

Your vector corpus is under 100K items and query volume is modest — pgvector handles this fine
You’re doing classification or structured analytics where the output is a label, not a ranked list of similar items
You’re prototyping and haven’t validated product-market fit yet
Your search is primarily keyword-based with occasional semantic enhancement

The question isn’t whether vector databases are useful. They are. The question is whether the operational complexity they add is justified by your actual scale and latency requirements. For many teams, it isn’t — yet.

The decision framework is straightforward: start with the simplest infrastructure that meets your current requirements. If you’re on PostgreSQL already, add pgvector and benchmark it against your actual workload. If it holds up, ship it. If you hit limits — query latency at the 95th percentile, index build times, memory pressure — then you have concrete evidence to justify a dedicated solution.

The Major Options Compared

Azure AI Search

Azure AI Search is not a pure vector database. It’s a managed search platform that added vector search capabilities natively, alongside its existing BM25 keyword search and semantic ranking. For Microsoft-stack teams, this combination is powerful.

The key advantage: hybrid search out of the box. A single query runs vector similarity, keyword matching, and semantic reranking — then fuses the results. You don’t orchestrate three separate systems. You don’t manage your own reranking pipeline. We covered this in depth in our Azure AI Search vs. Elasticsearch comparison.

Integrated vectorization means Azure AI Search can call your embedding model during both indexing and query time, so you don’t build a separate embedding pipeline. For teams already on Azure with Azure OpenAI, the integration is remarkably smooth.

The trade-off: you’re locked into the Azure ecosystem, and the vector search capabilities, while solid, aren’t as tunable as purpose-built vector databases. Fine-grained control over HNSW parameters, quantization strategies, and similarity metrics is more limited.

Pinecone

Pinecone is the vector database that most teams encounter first. It’s fully managed, serverless, and the API is dead simple — create an index, upsert vectors, query. There’s no infrastructure to operate.

It’s fast to get started with and handles scale well. The serverless pricing model means you’re not paying for idle capacity. For teams that want to focus on their application and treat vector search as a commodity service, Pinecone removes a lot of friction.

The concern: vendor lock-in. Your vectors and indexes live entirely in Pinecone’s infrastructure. There’s no self-hosted option, no data portability standard, and migrating away means re-indexing everything in a different system. For a piece of infrastructure that sits in your critical path, that dependency deserves scrutiny.

Weaviate

Weaviate is open-source, supports both self-hosted and managed cloud deployments, and offers built-in vectorization (it can call embedding models directly, similar to Azure AI Search). It supports hybrid search, multi-tenancy, and generative search modules.

The flexibility is the draw. You can run it in your own Kubernetes cluster, keep data in your own infrastructure, and customize the deployment to your requirements. The community is active, and the documentation is above average for the space.

The trade-off is operational complexity. Running a distributed vector database in production requires the same care as running any distributed system — monitoring, scaling, backup, upgrades. If you chose Weaviate to avoid cloud vendor lock-in, make sure you’re prepared to operate it.

pgvector (PostgreSQL)

pgvector adds vector similarity search to PostgreSQL. If your application already runs on PostgreSQL, this is the lowest-friction option by far. No new infrastructure, no new operational burden, no new vendor relationship.

It supports HNSW and IVFFlat indexes, cosine similarity, L2 distance, and inner product. For datasets under a few hundred thousand vectors with moderate query loads, performance is genuinely adequate.

The limits are real, though. pgvector’s query performance degrades faster than purpose-built systems as data volume grows. It doesn’t have the sophisticated caching, memory management, or distributed query execution that dedicated vector databases provide. And running your vector workload on the same PostgreSQL instance as your transactional data means competing for resources.

Others Worth Knowing

Qdrant is a Rust-based vector database with strong performance characteristics and a clean API. It supports filtering, payload storage, and both self-hosted and cloud deployment. If performance benchmarks matter to your use case, Qdrant consistently ranks well.

Milvus targets large-scale similarity search workloads and has strong support for GPU-accelerated indexing. It’s more operationally complex but handles billion-vector-scale deployments. If you’re building at that scale, it belongs on your shortlist.

Decision Guidance

Factor	Azure AI Search	Pinecone	Weaviate	pgvector
Best for	Microsoft-stack teams building hybrid search/RAG	Teams wanting zero-ops vector search	Teams needing open-source flexibility	Small-to-mid scale, already on PostgreSQL
Hybrid search	Native (vector + keyword + semantic)	Vector only (metadata filtering)	Native (vector + keyword)	Vector only (combine with FTS manually)
Managed option	Fully managed	Fully managed (only option)	Weaviate Cloud	Managed PostgreSQL providers
Self-hosted	No	No	Yes	Yes (it’s PostgreSQL)
Operational burden	Low	Very low	Medium-High	Low (if already running PostgreSQL)
Scale ceiling	High	High	High	Medium
Vendor lock-in	Azure	Pinecone	Low	None

Architecture Patterns

In a production system, the vector database is one component in a larger retrieval pipeline. Understanding where it fits prevents architectural mistakes.

Hybrid Search

Pure vector search has a weakness: it can miss exact matches. A user searching for part number “XR-4200-B” needs keyword precision, not semantic similarity. Hybrid search combines vector similarity with BM25 keyword matching and fuses the results using reciprocal rank fusion or a learned reranker.

Azure AI Search does this natively. If you’re using a dedicated vector database like Pinecone or Qdrant, you’ll need a separate keyword search system and fusion logic in your application layer. This is manageable but adds architectural complexity.

Metadata Filtering

Every production vector search query should include metadata filters. When a user asks about current safety procedures, you filter by document_status: active and document_type: safety before running vector similarity. Pre-filtering narrows the search space and ensures results are contextually appropriate. Most vector databases support this, but the performance characteristics of filtered queries vary significantly between platforms — benchmark with your actual filter patterns.

Multi-Tenancy

If you’re building a SaaS product or serving multiple business units, tenant isolation in your vector database matters. Some platforms (Weaviate, Qdrant) support native multi-tenancy with isolated indexes per tenant. Others require you to implement isolation through metadata filtering or separate indexes. Get this wrong and you have a data leakage risk.

Embedding Pipeline

Your embedding model (typically Azure OpenAI’s text-embedding-3-large or text-embedding-3-small) sits upstream of the vector database. Documents get embedded during ingestion; queries get embedded at search time. The critical requirement: the same model must be used for both. Mixing models produces vectors in incompatible spaces, and your search results will be meaningless.

Plan your embedding pipeline for model upgrades from day one. When you switch embedding models — and you will — every vector in your database needs to be re-embedded. Design your ingestion pipeline to support full re-indexing without downtime.

The Mistakes We See Most Often

After working on vector search implementations across document intelligence and data platform projects, these failure patterns repeat consistently.

Choosing a platform before understanding requirements. A team reads a blog post, picks Pinecone, builds on it for three months, then discovers they need hybrid search with keyword matching and metadata filtering that would have been trivial in Azure AI Search. Start with requirements. Pick the platform that fits.

Over-indexing on benchmark performance. Synthetic benchmarks measure throughput on uniform data distributions with no filtering. Your production workload has complex metadata filters, skewed data distributions, and bursty traffic. Benchmark against your actual data with your actual query patterns.

Ignoring operational complexity. A self-hosted Weaviate cluster needs monitoring, backup, scaling, and upgrades — just like any distributed system. If your team doesn’t have experience operating distributed databases, the managed options cost less in practice, even if they cost more on paper.

Not planning for embedding model upgrades. Embedding models improve. When you upgrade from text-embedding-ada-002 to text-embedding-3-large, every vector in your index is obsolete. If your architecture can’t re-embed and re-index your entire corpus without downtime, you’ve locked yourself into your current model.

What Experienced AI Teams Do Differently

The teams that get vector infrastructure right share a few practices.

They start with hybrid search, not pure vector search. Keyword matching handles precision queries. Vector search handles semantic queries. Combining both covers the full spectrum of how users actually search. Teams that start with vector-only search always end up bolting on keyword matching later.

They plan for embedding model changes from the start. The ingestion pipeline is designed to re-embed the entire corpus in the background while the current index serves queries. Blue-green indexing — building a new index alongside the old one and switching over atomically — is the pattern that works.

They treat the vector database as a long-term architectural decision, not a tactical tool choice. The vector database touches your ingestion pipeline, your query layer, your caching strategy, your backup and recovery process, and your cost model. Switching later is possible but expensive. Invest the time upfront to choose well.

The best vector search implementation we’ve built retrieves from three sources — vector similarity, keyword match, and structured metadata lookup — and fuses the results with a semantic reranker. The vector database is essential, but it’s one layer in a system designed for relevance, not just similarity.

Making the Right Infrastructure Decision

Vector databases are real infrastructure solving real problems. They’re not hype. But they’re also not required for every AI project, and the differences between platforms matter more than most comparison articles acknowledge.

If you’re building enterprise AI systems and evaluating vector infrastructure, here’s where to go from here:

Read our deep dive on building enterprise RAG systems — vector databases are the retrieval layer, but the architecture around them determines success or failure
See how we approach document intelligence for production document search and retrieval systems
Explore our data foundations practice — vector search is only as good as the data pipeline feeding it
Talk to our team about your specific architecture and requirements
Or start with a free AI Advisor session to evaluate whether your current infrastructure is the right fit for what you’re building

The platform choice matters. The architecture around it matters more.

Vector Databases Explained: What Engineering Leaders Need to Know