Which vector database is best for a legal or document-heavy RAG system?

Weaviate performed best for our legal RAG use case because its hybrid search (BM25 + vector) is a first-class feature and its schema system maps naturally to structured documents. Pinecone would be appropriate if you want to eliminate database operations entirely. Qdrant is the right call if raw query latency is the top priority.

Picking a vector DB in 2025

What is a vector database? A vector database stores high-dimensional embeddings — numerical representations of text, images, or other data — and enables fast approximate nearest-neighbor search. In RAG and AI applications (as of 2025), vector databases are the retrieval layer that connects LLMs to domain-specific knowledge.

TL;DR: We benchmarked Pinecone, Weaviate, and Qdrant against a 400K-document legal corpus. Qdrant won on latency. Weaviate won on hybrid search. Pinecone won on operational simplicity. We shipped with Weaviate and would do it again — but not for the reasons we expected.

We benchmarked three vector databases against the same production corpus: 400,000 legal documents, a mixed query load of exact lookups and semantic searches, and a team of two engineers responsible for operating whatever we chose.

The results were counterintuitive enough that we think they're worth writing down.

The three contenders

Pinecone is the managed option — serverless, no infrastructure to own, fast to start. Query latency in our tests averaged 18ms at p50, 45ms at p99. Index updates were near-real-time. The tradeoff: pricing scales with index size in ways that surprised us under higher load, and you have limited control over the underlying infrastructure.

Weaviate is open-source with a managed cloud option. Hybrid search (BM25 + vector) is a first-class feature, not a bolt-on. Index rebuild on large corpora took 40–60 minutes in our tests. Developer experience was strong — the schema system maps naturally to how we were thinking about our documents.

Qdrant is open-source, Rust-based, and the fastest of the three in our benchmarks: 11ms p50, 28ms p99 on the same hardware. Filtering is flexible. The tradeoff: hybrid search required more configuration than Weaviate, and the ecosystem tooling was thinner at the time of our evaluation (late 2025).

What we chose and why

We shipped with Weaviate. Not because it won on latency — it didn't — but because the operational profile matched our team. Two engineers maintaining a legal AI system need hybrid search to work reliably without ongoing tuning, and Weaviate's hybrid implementation was more production-stable in our testing.

Qdrant would have been the right call for a latency-sensitive consumer product with a larger team to own the infrastructure. Pinecone would have been right if we'd wanted to eliminate database operations entirely and were willing to pay for it.

The reversal

Our initial evaluation over-weighted benchmark latency and under-weighted operability. A 7ms difference in p50 query time is not meaningful to end users. An index rebuild that blocks your deployment window is.

The question to ask first is not "which is fastest?" It's "which failure modes can my team handle?" That reframe changes the answer for most teams.

Choosing a vector database: a practical decision framework

| Priority | Best fit (as of 2025) | |----------|----------------------| | Minimal ops overhead | Pinecone (managed) | | Hybrid search out of the box | Weaviate | | Raw query speed, self-hosted | Qdrant | | Embedded / edge deployment | Chroma or LanceDB | | Full SQL + vector in one DB | pgvector (PostgreSQL extension) |

Q: Do I need a dedicated vector database for a RAG application? Not always. For smaller corpora (under ~100K documents) or early-stage projects, pgvector running inside your existing PostgreSQL instance is a reasonable starting point. The overhead of a dedicated vector DB is justified when you need sub-20ms retrieval at scale, advanced filtering, or hybrid search at production volume.

Q: What is the difference between Pinecone, Weaviate, and Qdrant? Pinecone is a fully managed service optimized for simplicity and fast time-to-production. Weaviate is open-source with strong hybrid search and a well-designed schema system. Qdrant is open-source, Rust-based, and benchmarks fastest for pure vector search — at the cost of more configuration for hybrid workloads.

If you're evaluating vector database options for a production RAG system, we've done this evaluation in production across legal and construction AI domains and can shortcut the process.

Picking a vector DB in 2025

The three contenders

What we chose and why

The reversal

Choosing a vector database: a practical decision framework

- Suggested citation

- About the author

Andrea Phillips

New notes in your inbox.

Picking a vector DB in 2025

The three contenders

What we chose and why

The reversal

Choosing a vector database: a practical decision framework

- Suggested citation

- About the author

Andrea Phillips

More from the logbook.

Your agents have amnesia. I gave mine a memory.

Beyond SaaS: introducing our paper on Delegate

Hecate is now an internal CLI

New notes in your inbox.