Submit a document for OCR. We extract text, detect tables, and optionally generate a searchable PDF. Processing is async — you get a job ID immediately, then poll for results or use webhooks.
cURL
TypeScript
Python
C#
Java
PHP
Go
CLI
curl -X POST https://api.case.dev/ocr/v1/process \
-H "Authorization: Bearer sk_case_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document_url": "https://storage.example.com/scanned-deposition.pdf"
}'
{
"id" : "1f4a195e-026b-41ff-b367-c61089f5f367" ,
"status" : "pending" ,
"document_url" : "https://storage.example.com/scanned-deposition.pdf" ,
"engine" : "doctr" ,
"created_at" : "2025-11-04T09:30:12Z" ,
"links" : {
"self" : "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367" ,
"text" : "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367/download/text" ,
"json" : "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367/download/json"
}
}
Parameters
Required
Parameter Type Description document_urlstring URL to your document. HTTP/HTTPS or s3://
Optional
Parameter Type Default Description document_idstring auto-generated Your internal reference ID enginestring doctrOCR engine (see below) callback_urlstring — Webhook URL for completion notification featuresobject {}Additional processing options
OCR engines
Engine Best for Speed doctrClean printed text, typed documents Fast paddleocrTables, forms, complex layouts, handwriting Medium
For legal documents: Start with doctr. If you’re getting poor results on forms or tables, try paddleocr.
Features
Enable additional processing:
{
"features" : {
"embed" : {}, // Generate searchable PDF
"tables" : { // Extract tables as CSV
"format" : "csv"
}
}
}
Checking status
Poll the job to check if processing is complete:
TypeScript
Python
C#
Java
PHP
Go
CLI
const result = await client . ocr . v1 . retrieve ( job . id );
if ( result . status === 'completed' ) {
// Download the extracted text
const text = await client . ocr . v1 . download ( job . id , 'text' );
console . log ( text );
}
Using webhooks
For large documents, use webhooks instead of polling:
TypeScript
Python
C#
Java
PHP
Go
CLI
const job = await client . ocr . v1 . process ({
document_url : 'https://storage.example.com/500-page-discovery.pdf' ,
callback_url : 'https://your-app.com/api/ocr-complete'
});
We POST the completed job to your callback URL when processing finishes.
S3 URLs
If your document is in S3, use an s3:// URL:
TypeScript
Python
C#
Java
PHP
Go
CLI
const job = await client . ocr . v1 . process ({
document_url : 's3://your-bucket/documents/deposition.pdf'
});
We automatically generate a presigned URL to access the file.
Examples
Scanned deposition
TypeScript
Python
C#
Java
PHP
Go
CLI
const job = await client . ocr . v1 . process ({
document_url : 'https://storage.example.com/deposition-smith.pdf' ,
document_id : 'smith-depo-2024' ,
engine : 'doctr' ,
features : { embed : {} } // Generate searchable PDF
});
Medical records with tables
TypeScript
Python
C#
Java
PHP
Go
CLI
const job = await client . ocr . v1 . process ({
document_url : 'https://storage.example.com/patient-records.pdf' ,
engine : 'paddleocr' , // Better for tables and forms
features : {
tables : { format : 'csv' },
embed : {}
},
callback_url : 'https://your-app.com/webhooks/ocr'
});
Handwritten notes
TypeScript
Python
C#
Java
PHP
Go
CLI
const job = await client . ocr . v1 . process ({
document_url : 'https://storage.example.com/witness-notes.jpg' ,
engine : 'paddleocr' // Better for handwriting
});