OCR Overview - Case.dev

Specialized OCR for the messy reality of legal documents. We handle what generic providers can’t: handwriting, poor scans, fax headers, and complex tables.

Quick example

curl -X POST https://api.case.dev/ocr/v1/process \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

Optimized for Legal

Feature	Why it matters for your app
Handwriting Recognition	Extract notes and annotations from uploaded documents
Table Reconstruction	Preserve structure for financial statements and forms
Bates Stamp Handling	Identify and index reference numbers separately
Searchable PDF (HOCR)	Return documents with text layers your users can search

Engine Selection

Choose based on your users’ document types:

Engine	Best for	Speed
`doctr`	Standard documents. High speed, good accuracy for typed text.	Fast
`paddleocr`	Tables and forms. Best-in-class table structure recognition.	Slower

Output formats

Format	Description
`text`	Plain text extraction
`json`	Structured output with coordinates, confidence scores
`pdf`	Searchable PDF (original with text layer)

Endpoints

Process

POST /ocr/v1/process — Submit a document for OCR

Status

GET /ocr/v1/:id — Check processing status

Download

GET /ocr/v1/:id/download/:type — Download results

Common patterns

With webhooks (recommended for large files)

casedev ocr:v1 process \
  --document-url "$DOCUMENT_URL" \
  --callback-url "https://your-app.com/webhooks/ocr-complete"

From S3

casedev ocr:v1 process \
  --document-url "s3://your-bucket/documents/upload.pdf"

With table extraction

casedev ocr:v1 process \
  --document-url "$DOCUMENT_URL" \
  --engine paddleocr \
  --features.tables '{"format": "csv"}'

Vault

Store OCR’d documents and make them searchable with semantic search

LLMs

Analyze extracted text with AI—summarize, classify, and extract entities

​Quick example

​Optimized for Legal

​Engine Selection

​Output formats

​Endpoints