voyage-large-2 is generally high-accuracy for semantic retrieval when you use it in a standard “embed → store → search” pipeline, but the exact accuracy you get depends on your data, chunking strategy, and how you evaluate relevance. The model is described as a general-purpose embedding model optimized for retrieval quality and intended to be used for semantic similarity search. That means it’s built to place related texts close together in vector space, so nearest-neighbor search can reliably surface the right passages even when the query and document don’t share exact keywords.
In practice, retrieval accuracy is driven by system choices as much as model quality. If you embed whole documents that mix multiple topics, you’ll usually see “close but not correct” matches because one vector can’t represent everything. A better approach is to chunk content (for example, 300–1,000 tokens per chunk with small overlap), embed each chunk, and store chunk metadata (doc_id, section title, URL, language, updated_at). Then build a small evaluation set: 30–100 real queries plus the passages you consider correct, and measure recall@k (did the correct chunk appear in top 5 or top 10?) and precision@k (how many of the top results are actually useful). This evaluation loop will tell you whether accuracy issues come from the embeddings or from chunking/filtering/index parameters. It also helps you detect cases where numeric-heavy text, version strings, or code snippets require tighter chunking so the exact details are present in the retrieved text.
Accuracy also depends on the vector database configuration that runs the nearest-neighbor search. A vector database such as Milvus or Zilliz Cloud (managed Milvus) will typically use approximate indexes for speed, and index parameters influence recall. If you tune for very low latency, you might trade off some recall (missing a relevant chunk that exists but isn’t visited by the search graph). If you tune for higher recall, latency and CPU cost can increase. The best way to talk about “accuracy” with voyage-large-2 is therefore: it provides strong retrieval-oriented embeddings, and you validate that strength on your corpus with a repeatable relevance test harness, while tuning chunking + index parameters until you hit your target recall/latency.
For more information, click here: https://zilliz.com/ai-models/voyage-large-2
