This is the core endpoint for all AI-powered features — summarization, extraction, analysis, drafting.
POST /llm/v1/chat/completions
cURL
TypeScript
Python
C#
Java
PHP
Go
CLI
curl -X POST https://api.case.dev/llm/v1/chat/completions \
-H "Authorization: Bearer sk_case_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Summarize this deposition in 3 bullet points."}
]
}'
{
"id" : "gen_01K972J7KV4Y0MJZ3SRTA6YYMH" ,
"object" : "chat.completion" ,
"model" : "anthropic/claude-sonnet-4.5" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Here are the key points: \n\n • Witness testified that... \n • Documents reviewed include... \n • Timeline established from..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 245 ,
"completion_tokens" : 87 ,
"total_tokens" : 332 ,
"cost" : 0.000105
}
}
Parameters
Required
Parameter Type Description messagesarray The conversation. Each message has a role and content.
Optional
Parameter Type Default Description modelstring casemark/casemark-core-1Which model to use. Browse all 195+ models → max_tokensnumber 4096 Maximum tokens to generate temperaturenumber 1 Randomness (0-2). Use 0 for factual tasks. streamboolean false Stream response token-by-token stoparray null Stop generation when these strings appear
Messages
Each message in the messages array:
Field Type Description rolestring system, user, or assistantcontentstring The message text
System prompts
Set the AI’s behavior with a system message:
TypeScript
Python
C#
Java
PHP
Go
CLI
const response = await client . llm . v1 . chat . createCompletion ({
model : 'anthropic/claude-sonnet-4.5' ,
messages : [
{
role : 'system' ,
content : 'You are a legal assistant. Be concise. Cite case law when relevant.'
},
{
role : 'user' ,
content : 'What are the elements of negligence?'
}
]
});
Multi-turn conversations
Include previous messages to maintain context:
TypeScript
Python
C#
Java
PHP
Go
CLI
const response = await client . llm . v1 . chat . createCompletion ({
model : 'openai/gpt-4o' ,
messages : [
{ role : 'user' , content : 'What is a deposition?' },
{ role : 'assistant' , content : 'A deposition is sworn testimony taken outside of court...' },
{ role : 'user' , content : 'How long do they typically last?' }
]
});
Streaming
Get responses token-by-token as they’re generated:
TypeScript
Python
C#
Java
PHP
Go
CLI
const stream = await client . llm . v1 . chat . createCompletion ({
model : 'anthropic/claude-sonnet-4.5' ,
messages : [{ role : 'user' , content : 'Write a case summary.' }],
stream : true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . choices [ 0 ]?. delta ?. content || '' );
}
Vision
Send images to models that support vision (Claude, GPT-4o):
const response = await client . llm . v1 . chat . createCompletion ({
model : 'anthropic/claude-sonnet-4.5' ,
messages : [
{
role : 'user' ,
content : [
{ type : 'text' , text : 'What medical equipment is visible in this image?' },
{ type : 'image_url' , image_url : { url : 'https://example.com/exhibit-a.jpg' } }
]
}
]
});
Usage and costs
Every response includes token counts and cost:
{
"usage" : {
"prompt_tokens" : 1245 ,
"completion_tokens" : 387 ,
"total_tokens" : 1632 ,
"cost" : 0.004896
}
}
Reduce costs: Use temperature: 0 for factual extraction. Try cheaper models like deepseek/deepseek-chat or qwen/qwen-2.5-72b-instruct for simpler tasks.
Common patterns
Deposition summary
TypeScript
Python
C#
Java
PHP
Go
CLI
const response = await client . llm . v1 . chat . createCompletion ({
model : 'anthropic/claude-sonnet-4.5' ,
messages : [
{
role : 'system' ,
content : `Summarize depositions with:
1. Key admissions
2. Timeline of events
3. Credibility issues
4. Contradictions with other testimony`
},
{ role : 'user' , content : depositionText }
],
temperature : 0.3 ,
max_tokens : 2000
});
TypeScript
Python
C#
Java
PHP
Go
CLI
const response = await client . llm . v1 . chat . createCompletion ({
model : 'openai/gpt-4o' ,
messages : [
{
role : 'system' ,
content : 'Extract all indemnification clauses. Return JSON: [{clause_text, page, party_protected}]'
},
{ role : 'user' , content : contractText }
],
temperature : 0
});
Medical record review
TypeScript
Python
C#
Java
PHP
Go
CLI
const response = await client . llm . v1 . chat . createCompletion ({
model : 'anthropic/claude-opus-4' ,
messages : [
{
role : 'system' ,
content : 'You are a medical-legal expert. Identify standard-of-care deviations and timeline inconsistencies.'
},
{ role : 'user' , content : medicalRecords }
],
max_tokens : 5000
});