File Upload
Upload CSV, JSON, NDJSON, XML, and PDF files to QANATIX — from kilobytes to multi-gigabyte.
File Upload
Upload files directly to QANATIX. Supported formats: CSV, JSON, NDJSON, XML, PDF.
Small files (under 50 MB) are processed synchronously — you get the full result in the response. Large files (over 50 MB) are accepted immediately and processed in the background — poll for progress.
Plan Limits
| Plan | Max Upload Size |
|---|---|
| Free | 10 MB |
| Pro | 1 GB |
| Scale | 10 GB |
| Enterprise | 10 GB |
CSV
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/supplier/file \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@suppliers.csv"CSV columns are mapped to record fields automatically. The name column (if present) becomes the record name. All other columns go into the record's data field. Encoding is auto-detected (UTF-8, Latin-1, etc.).
JSON
Upload a JSON file containing an array of records:
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/product/file \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@products.json"Expected format:
[
{"name": "Product A", "sku": "SKU-001", "price": 12.50},
{"name": "Product B", "sku": "SKU-002", "price": 8.75}
]NDJSON
Newline-delimited JSON — one record per line:
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/event/file \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@events.ndjson"NDJSON is the recommended format for large files (100 MB+). Each line is parsed independently, so a single malformed line won't reject the entire file.
XML
QANATIX supports BMEcat catalog XML, SAP IDoc XML, and generic XML:
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/product/file \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@catalog.xml"For generic XML, use the record_tag parameter to specify which element represents a record:
curl -X POST "https://api.qanatix.com/api/v1/upload/manufacturing/product/file?record_tag=item" \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@products.xml"BMEcat and SAP IDoc formats are auto-detected from the XML content.
QANATIX extracts text from PDFs using pymupdf4llm, producing clean markdown optimized for LLM consumption.
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/document/file \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@datasheet.pdf"Chunking strategy:
- PDFs with 1-3 pages: combined into one record
- PDFs with 4+ pages: one record per page
Each record includes page numbers and document metadata.
Batch JSON API
For programmatic upload, use the batch endpoint (max 10,000 records per request):
curl -X POST https://api.qanatix.com/api/v1/upload/manufacturing/fastener/batch \
-H "Authorization: Bearer sk_live_abc123..." \
-H "Content-Type: application/json" \
-d '[
{
"name": "Steel Bolt M12x80",
"source_id": "ERP-001",
"data": {
"material": "Carbon Steel",
"price_eur": 0.15
}
}
]'Use source_id for upsert behavior — if a record with the same source_id exists, it's updated instead of duplicated.
Response
Small file (synchronous) — 201
{
"upload_id": "a1b2c3d4-...",
"status": "complete",
"summary": {
"submitted": 150,
"accepted": 147,
"rejected": 0,
"dedup_skipped": 3
},
"errors": [],
"metadata": {
"file_hash": "sha256:abc...",
"content_type": "text/csv",
"encoding_detected": "utf-8"
}
}Large file (async) — 202
Files over 50 MB return immediately with status processing:
{
"upload_id": "e5f6g7h8-...",
"status": "processing",
"message": "Large file accepted. Poll GET /uploads/{upload_id} for progress.",
"summary": { "submitted": 0, "accepted": 0, "rejected": 0, "dedup_skipped": 0 },
"errors": [],
"metadata": { "file_hash": "sha256:def..." }
}Polling for Progress
For large file uploads, poll the status endpoint:
curl https://api.qanatix.com/api/v1/uploads/{upload_id} \
-H "Authorization: Bearer sk_live_abc123..."Response while processing:
{
"id": "e5f6g7h8-...",
"status": "processing",
"record_count": 5300000,
"records_processed": 1250000,
"accepted": 1249800,
"rejected": 200,
"file_name": "german_companies.csv",
"file_size": 1932735283,
"created_at": "2026-03-15T10:30:00Z",
"completed_at": null
}The records_processed field updates as batches complete, so you can track live progress.
Response when complete:
{
"id": "e5f6g7h8-...",
"status": "complete",
"record_count": 5300000,
"records_processed": 5300000,
"accepted": 5299200,
"rejected": 800,
"file_name": "german_companies.csv",
"file_size": 1932735283,
"created_at": "2026-03-15T10:30:00Z",
"completed_at": "2026-03-15T11:15:00Z"
}Possible status values: processing, complete, partial (some records failed), failed (critical error).
Upload Errors
If records are rejected, retrieve error details:
curl "https://api.qanatix.com/api/v1/uploads/{upload_id}/errors?limit=100" \
-H "Authorization: Bearer sk_live_abc123..."[
{
"id": "...",
"upload_id": "e5f6g7h8-...",
"source_row_num": 42,
"raw_payload": { "name": "", "sku": null },
"error_type": "validation",
"error_details": { "message": "name is required" },
"retry_count": 0,
"status": "pending"
}
]How It Works
QANATIX streams uploaded files to disk in 256 KB chunks — memory usage stays under 20 MB regardless of file size.
For large files:
- File is streamed to a temp file on the server
- A background task reads the file in 500-record batches
- Each batch is validated, deduplicated, and written to the database
- Progress is updated after each batch
- If the error rate exceeds 10%, processing halts (circuit breaker)
- The temp file is deleted when processing completes
Concurrency: Up to 2 large uploads per tenant can run simultaneously. If the server is at capacity, you'll get a 503 — retry after a few minutes.
Deduplication
Every upload is content-hashed (SHA-256). If you upload the same file twice, the second upload is skipped and returns the original upload ID. Individual records within a batch are also deduplicated by content hash.
Python Example
import requests
import time
API = "https://api.qanatix.com/api/v1"
HEADERS = {"Authorization": "Bearer sk_live_abc123..."}
# Upload a large CSV
with open("companies.csv", "rb") as f:
resp = requests.post(
f"{API}/upload/business/company/file",
headers=HEADERS,
files={"file": ("companies.csv", f, "text/csv")},
)
data = resp.json()
upload_id = data["upload_id"]
if resp.status_code == 202:
# Large file — poll for progress
while True:
status = requests.get(
f"{API}/uploads/{upload_id}", headers=HEADERS
).json()
processed = status.get("records_processed") or 0
total = status.get("record_count") or 0
print(f"Progress: {processed:,} / {total:,} records")
if status["status"] in ("complete", "partial", "failed"):
print(f"Done: {status['status']}")
break
time.sleep(5)
else:
# Small file — already done
print(f"Status: {data['status']}, accepted: {data['summary']['accepted']}")