Quick example
Optimized for Legal
| Feature | Why it matters for your app |
|---|---|
| Handwriting Recognition | Extract notes and annotations from uploaded documents |
| Table Reconstruction | Preserve structure for financial statements and forms |
| Bates Stamp Handling | Identify and index reference numbers separately |
| Searchable PDF (HOCR) | Return documents with text layers your users can search |
Engine Selection
Choose based on your users’ document types:| Engine | Best for | Speed |
|---|---|---|
doctr | Standard documents. High speed, good accuracy for typed text. | Fast |
paddleocr | Tables and forms. Best-in-class table structure recognition. | Slower |
Output formats
| Format | Description |
|---|---|
text | Plain text extraction |
json | Structured output with coordinates, confidence scores |
pdf | Searchable PDF (original with text layer) |
Endpoints
Process
POST /ocr/v1/process — Submit a document for OCRStatus
GET /ocr/v1/:id — Check processing statusDownload
GET /ocr/v1/:id/download/:type — Download resultsCommon patterns
With webhooks (recommended for large files)
From S3
With table extraction
Related services
Vault
Store OCR’d documents and make them searchable with semantic search
LLMs
Analyze extracted text with AI—summarize, classify, and extract entities

