QANATIX
Data Import

Data Import Overview

How data flows into QANATIX — 10 sources, instantly queryable.

Data Import

QANATIX accepts data from 10 sources. All data is instantly queryable the moment it's uploaded — no async processing or indexing delays.

Supported sources

SourceMethodFormat
CSVFile upload.csv
JSONFile upload or batch API.json
NDJSONFile upload or streaming.ndjson
PDFFile upload.pdf — extracted to markdown
XML / BMEcatFile upload.xml — catalog standards
SAP IDocFile upload.xml — MATMAS, DEBMAS, CREMAS
PostgreSQLDatabase connectorServer-side cursors
MySQLDatabase connectorStreaming query
MongoDBDatabase connectorCollection sync
Neo4jDatabase connectorCypher queries
REST APIPush / webhookJSON payload
NDJSON streamStreaming endpointBackpressure-controlled

Pipeline

Every record goes through 3 stages:

Extract → Normalize → Validate + Store
  1. Extract — parse the source format (CSV rows, JSON objects, PDF pages, XML elements, DB rows)
  2. Normalize — map to QANATIX record structure (name, record_type, collection_data)
  3. Validate + Store — store in Postgres and make instantly queryable via full-text search and structured filters

Deduplication

QANATIX deduplicates at the record level by content hash. Duplicate records are skipped and reported as dedup_skipped in the response summary. The upload still returns 201 with accepted: 0 and dedup_skipped: N for fully duplicate batches. Re-upload with source_id upserts existing records instead of creating duplicates.

Batch sizes

SettingDefault
JSON batch max records5,000
File upload max size50 MB
Database connector batch5,000 rows
Streaming buffer100 records or 5s flush

Guides

On this page