QANATIX
Deployment

Embedding Model

BGE-M3 multi-vector embeddings — dense, sparse, and ColBERT in a single pass.

Embedding Model

QANATIX uses BGE-M3 as its embedding engine. It's the only model that produces all three vector types — dense, sparse, and ColBERT — in a single pass, powering the hybrid search pipeline.

Why BGE-M3

AttributeValue
ModelBAAI/bge-m3
Dimensions1024
CostFree — runs locally, no API key
VectorsDense + Sparse + ColBERT
Languages100+ (multilingual natively)
Self-hostedYes — works fully offline

Three vector types

  1. Dense vectors — semantic understanding ("luxury with pool" matches high-end hotels with pools)
  2. Sparse vectors — keyword precision (learned BM25-style, "Vienna" boosts Vienna results)
  3. ColBERT vectors — token-level late interaction for fine-grained reranking

All three are fused via DBSF for maximum retrieval quality, then optionally reranked by a cross-encoder.

Configuration

EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_DIMENSIONS=1024

No API key needed. The model downloads automatically on first run (~2 GB, cached at ~/.cache/huggingface/).

Hardware guidance

Dataset sizeCPUGPU
< 10K entitiesFine, ~1-5s per batch of 20Not needed
10K – 100K entitiesWorks, slower pipelineRecommended (NVIDIA, 8GB+ VRAM)
> 100K entitiesSlowStrongly recommended

GPU recommendations

GPUVRAMThroughputCost tier
NVIDIA T416 GB~100 embeddings/secBudget
NVIDIA L424 GB~300 embeddings/secBest value
NVIDIA A10G24 GB~300 embeddings/secCloud standard
NVIDIA A10040/80 GB~800 embeddings/secHigh throughput

How it works

Entity ingested

Worker picks up entity (SAQ background job)

description_llm text → BGE-M3 → dense + sparse + ColBERT vectors

Vectors indexed in Qdrant (all three types)

Entity marked as "indexed" — now searchable

Caching

CacheTTLKey
Query embedding1 hourmodel + dimensions + query hash
Entity embedding7 daysmodel + dimensions + text hash

Embeddings are cached in Redis. Re-ingesting identical text skips encoding entirely.

On this page