QANATIX
Data Import

File Upload

Upload CSV, JSON, NDJSON, XML, and PDF files to QANATIX — from kilobytes to multi-gigabyte.

File Upload

Upload files directly to QANATIX. Supported formats: CSV, JSON, NDJSON, XML, PDF.

Small files (under 50 MB) are processed synchronously — you get the full result in the response. Large files (over 50 MB) are accepted immediately and processed in the background — poll for progress.

Plan Limits

PlanMax Upload Size
Free10 MB
Pro1 GB
Scale10 GB
Enterprise10 GB

CSV

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/supplier/file \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@suppliers.csv"

CSV columns are mapped to record fields automatically. The name column (if present) becomes the record name. All other columns go into the record's data field. Encoding is auto-detected (UTF-8, Latin-1, etc.).

JSON

Upload a JSON file containing an array of records:

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/product/file \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@products.json"

Expected format:

[
  {"name": "Product A", "sku": "SKU-001", "price": 12.50},
  {"name": "Product B", "sku": "SKU-002", "price": 8.75}
]

NDJSON

Newline-delimited JSON — one record per line:

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/event/file \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@events.ndjson"

NDJSON is the recommended format for large files (100 MB+). Each line is parsed independently, so a single malformed line won't reject the entire file.

XML

QANATIX supports BMEcat catalog XML, SAP IDoc XML, and generic XML:

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/product/file \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@catalog.xml"

For generic XML, use the record_tag parameter to specify which element represents a record:

curl -X POST "https://api.qanatix.com/api/v1/upload/manufacturing/product/file?record_tag=item" \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@products.xml"

BMEcat and SAP IDoc formats are auto-detected from the XML content.

PDF

QANATIX extracts text from PDFs using pymupdf4llm, producing clean markdown optimized for LLM consumption.

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/document/file \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@datasheet.pdf"

Chunking strategy:

  • PDFs with 1-3 pages: combined into one record
  • PDFs with 4+ pages: one record per page

Each record includes page numbers and document metadata.

Batch JSON API

For programmatic upload, use the batch endpoint (max 10,000 records per request):

curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/fastener/batch \
  -H "Authorization: Bearer sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '[
    {
      "name": "Steel Bolt M12x80",
      "source_id": "ERP-001",
      "data": {
        "material": "Carbon Steel",
        "price_eur": 0.15
      }
    }
  ]'

Use source_id for upsert behavior — if a record with the same source_id exists, it's updated instead of duplicated.

Response

Small file (synchronous) — 201

{
  "upload_id": "a1b2c3d4-...",
  "status": "complete",
  "summary": {
    "submitted": 150,
    "accepted": 147,
    "rejected": 0,
    "dedup_skipped": 3
  },
  "errors": [],
  "metadata": {
    "file_hash": "sha256:abc...",
    "content_type": "text/csv",
    "encoding_detected": "utf-8"
  }
}

Large file (async) — 202

Files over 50 MB return immediately with status processing:

{
  "upload_id": "e5f6g7h8-...",
  "status": "processing",
  "message": "Large file accepted. Poll GET /uploads/{upload_id} for progress.",
  "summary": { "submitted": 0, "accepted": 0, "rejected": 0, "dedup_skipped": 0 },
  "errors": [],
  "metadata": { "file_hash": "sha256:def..." }
}

Polling for Progress

For large file uploads, poll the status endpoint:

curl https://api.qanatix.com/api/v1/uploads/{upload_id} \
  -H "Authorization: Bearer sk_live_abc123..."

Response while processing:

{
  "id": "e5f6g7h8-...",
  "status": "processing",
  "record_count": 5300000,
  "records_processed": 1250000,
  "accepted": 1249800,
  "rejected": 200,
  "file_name": "german_companies.csv",
  "file_size": 1932735283,
  "created_at": "2026-03-15T10:30:00Z",
  "completed_at": null
}

The records_processed field updates as batches complete, so you can track live progress.

Response when complete:

{
  "id": "e5f6g7h8-...",
  "status": "complete",
  "record_count": 5300000,
  "records_processed": 5300000,
  "accepted": 5299200,
  "rejected": 800,
  "file_name": "german_companies.csv",
  "file_size": 1932735283,
  "created_at": "2026-03-15T10:30:00Z",
  "completed_at": "2026-03-15T11:15:00Z"
}

Possible status values: processing, complete, partial (some records failed), failed (critical error).

Upload Errors

If records are rejected, retrieve error details:

curl "https://api.qanatix.com/api/v1/uploads/{upload_id}/errors?limit=100" \
  -H "Authorization: Bearer sk_live_abc123..."
[
  {
    "id": "...",
    "upload_id": "e5f6g7h8-...",
    "source_row_num": 42,
    "raw_payload": { "name": "", "sku": null },
    "error_type": "validation",
    "error_details": { "message": "name is required" },
    "retry_count": 0,
    "status": "pending"
  }
]

How It Works

QANATIX streams uploaded files to disk in 256 KB chunks — memory usage stays under 20 MB regardless of file size.

For large files:

  1. File is streamed to a temp file on the server
  2. A background task reads the file in 500-record batches
  3. Each batch is validated, deduplicated, and written to the database
  4. Progress is updated after each batch
  5. If the error rate exceeds 10%, processing halts (circuit breaker)
  6. The temp file is deleted when processing completes

Concurrency: Up to 2 large uploads per tenant can run simultaneously. If the server is at capacity, you'll get a 503 — retry after a few minutes.

Deduplication

Every upload is content-hashed (SHA-256). If you upload the same file twice, the second upload is skipped and returns the original upload ID. Individual records within a batch are also deduplicated by content hash.

Python Example

import requests
import time

API = "https://api.qanatix.com/api/v1"
HEADERS = {"Authorization": "Bearer sk_live_abc123..."}

# Upload a large CSV
with open("companies.csv", "rb") as f:
    resp = requests.post(
        f"{API}/upload/business/company/file",
        headers=HEADERS,
        files={"file": ("companies.csv", f, "text/csv")},
    )

data = resp.json()
upload_id = data["upload_id"]

if resp.status_code == 202:
    # Large file — poll for progress
    while True:
        status = requests.get(
            f"{API}/uploads/{upload_id}", headers=HEADERS
        ).json()

        processed = status.get("records_processed") or 0
        total = status.get("record_count") or 0
        print(f"Progress: {processed:,} / {total:,} records")

        if status["status"] in ("complete", "partial", "failed"):
            print(f"Done: {status['status']}")
            break

        time.sleep(5)
else:
    # Small file — already done
    print(f"Status: {data['status']}, accepted: {data['summary']['accepted']}")

On this page