Data Import
Data Import Overview
How data flows into QANATIX — 10 sources, instantly queryable.
Data Import
QANATIX accepts data from 10 sources. All data is instantly queryable the moment it's uploaded — no async processing or indexing delays.
Supported sources
| Source | Method | Format |
|---|---|---|
| CSV | File upload | .csv |
| JSON | File upload or batch API | .json |
| NDJSON | File upload or streaming | .ndjson |
| File upload | .pdf — extracted to markdown | |
| XML / BMEcat | File upload | .xml — catalog standards |
| SAP IDoc | File upload | .xml — MATMAS, DEBMAS, CREMAS |
| PostgreSQL | Database connector | Server-side cursors |
| MySQL | Database connector | Streaming query |
| MongoDB | Database connector | Collection sync |
| Neo4j | Database connector | Cypher queries |
| REST API | Push / webhook | JSON payload |
| NDJSON stream | Streaming endpoint | Backpressure-controlled |
Pipeline
Every record goes through 3 stages:
Extract → Normalize → Validate + Store- Extract — parse the source format (CSV rows, JSON objects, PDF pages, XML elements, DB rows)
- Normalize — map to QANATIX record structure (
name,record_type,collection_data) - Validate + Store — store in Postgres and make instantly queryable via full-text search and structured filters
Deduplication
QANATIX deduplicates at the record level by content hash. Duplicate records are skipped and reported as dedup_skipped in the response summary. The upload still returns 201 with accepted: 0 and dedup_skipped: N for fully duplicate batches. Re-upload with source_id upserts existing records instead of creating duplicates.
Batch sizes
| Setting | Default |
|---|---|
| JSON batch max records | 5,000 |
| File upload max size | 50 MB |
| Database connector batch | 5,000 rows |
| Streaming buffer | 100 records or 5s flush |
Guides
- File Upload — CSV, JSON, NDJSON, PDF
- XML Catalogs — BMEcat, SAP IDoc
- API Upload — REST push, webhooks
- Database Connectors — Postgres, MySQL, MongoDB, Neo4j
- Streaming — NDJSON streaming with backpressure