Deployment
Embedding Model
BGE-M3 multi-vector embeddings — dense, sparse, and ColBERT in a single pass.
Embedding Model
QANATIX uses BGE-M3 as its embedding engine. It's the only model that produces all three vector types — dense, sparse, and ColBERT — in a single pass, powering the hybrid search pipeline.
Why BGE-M3
| Attribute | Value |
|---|---|
| Model | BAAI/bge-m3 |
| Dimensions | 1024 |
| Cost | Free — runs locally, no API key |
| Vectors | Dense + Sparse + ColBERT |
| Languages | 100+ (multilingual natively) |
| Self-hosted | Yes — works fully offline |
Three vector types
- Dense vectors — semantic understanding ("luxury with pool" matches high-end hotels with pools)
- Sparse vectors — keyword precision (learned BM25-style, "Vienna" boosts Vienna results)
- ColBERT vectors — token-level late interaction for fine-grained reranking
All three are fused via DBSF for maximum retrieval quality, then optionally reranked by a cross-encoder.
Configuration
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_DIMENSIONS=1024No API key needed. The model downloads automatically on first run (~2 GB, cached at ~/.cache/huggingface/).
Hardware guidance
| Dataset size | CPU | GPU |
|---|---|---|
| < 10K entities | Fine, ~1-5s per batch of 20 | Not needed |
| 10K – 100K entities | Works, slower pipeline | Recommended (NVIDIA, 8GB+ VRAM) |
| > 100K entities | Slow | Strongly recommended |
GPU recommendations
| GPU | VRAM | Throughput | Cost tier |
|---|---|---|---|
| NVIDIA T4 | 16 GB | ~100 embeddings/sec | Budget |
| NVIDIA L4 | 24 GB | ~300 embeddings/sec | Best value |
| NVIDIA A10G | 24 GB | ~300 embeddings/sec | Cloud standard |
| NVIDIA A100 | 40/80 GB | ~800 embeddings/sec | High throughput |
How it works
Entity ingested
↓
Worker picks up entity (SAQ background job)
↓
description_llm text → BGE-M3 → dense + sparse + ColBERT vectors
↓
Vectors indexed in Qdrant (all three types)
↓
Entity marked as "indexed" — now searchableCaching
| Cache | TTL | Key |
|---|---|---|
| Query embedding | 1 hour | model + dimensions + query hash |
| Entity embedding | 7 days | model + dimensions + text hash |
Embeddings are cached in Redis. Re-ingesting identical text skips encoding entirely.