Data Import

QANATIX accepts data from 10 sources. All data is instantly queryable the moment it's uploaded — no async processing or indexing delays.

Supported sources

Source	Method	Format
CSV	File upload	`.csv`
JSON	File upload or batch API	`.json`
NDJSON	File upload or streaming	`.ndjson`
PDF	File upload	`.pdf` — extracted to markdown
XML / BMEcat	File upload	`.xml` — catalog standards
SAP IDoc	File upload	`.xml` — MATMAS, DEBMAS, CREMAS
PostgreSQL	Database connector	Server-side cursors
MySQL	Database connector	Streaming query
MongoDB	Database connector	Collection sync
Neo4j	Database connector	Cypher queries
REST API	Push / webhook	JSON payload
NDJSON stream	Streaming endpoint	Backpressure-controlled

Pipeline

Every record goes through 3 stages:

Extract → Normalize → Validate + Store

Extract — parse the source format (CSV rows, JSON objects, PDF pages, XML elements, DB rows)
Normalize — map to QANATIX record structure (name, record_type, collection_data)
Validate + Store — store in Postgres and make instantly queryable via full-text search and structured filters

QANATIX deduplicates at the record level by content hash. Duplicate records are skipped and reported as dedup_skipped in the response summary. The upload still returns 201 with accepted: 0 and dedup_skipped: N for fully duplicate batches. Re-upload with source_id upserts existing records instead of creating duplicates.

Batch sizes

Setting	Default
JSON batch max records	5,000
File upload max size	50 MB
Database connector batch	5,000 rows
Streaming buffer	100 records or 5s flush

Guides

File Upload — CSV, JSON, NDJSON, PDF
XML Catalogs — BMEcat, SAP IDoc
API Upload — REST push, webhooks
Database Connectors — Postgres, MySQL, MongoDB, Neo4j
Streaming — NDJSON streaming with backpressure

Data Import Overview

Data Import

Supported sources

Pipeline

Deduplication

Batch sizes

Guides

On this page