Persistent, multi-turn agent sessions with real-time streaming
Chat sessions give you an interactive, multi-turn agent that stays alive between messages. Unlike runs (fire-and-forget batch jobs), a chat session keeps its sandbox running so the agent retains full context — files, environment, and conversation history — across every message you send.
Messages are proxied to the agent running in the sandbox. The agent has the same full tool access as batch runs — vaults, legal research, OCR, web search, and the casedev CLI.
The response contains the agent’s output plus a usage object when token data is available. usage.costMicros is the assistant turn’s LLM cost. usage.summary and usage.entries aggregate all Case.dev billable activity correlated to that turn, including downstream tool/API calls that happened under the session key.
usage.entries[] is the audit log. usage.summary is the sum of those entries. For compatibility, the top-level usage.model, token counts, and usage.costMicros still reflect the assistant turn’s direct LLM usage.
If the sandbox was snapshotted due to idle timeout, sending a message automatically restores it.
There is a brief resume delay (~5-10s) but no context is lost.
Use respond when you want one request that both submits the user message and streams only the current assistant turn.respond returns a turn-scoped SSE stream with normalized events:
turn.started
turn.status
message.created
message.part.updated
message.completed
session.usage
turn.completed
It excludes historical replay and raw upstream session.* events, so your UI can render a clean, deterministic per-turn stream.
curl -N -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/respond" \ -H "Authorization: Bearer $CASEDEV_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "parts": [{"type": "text", "text": "Summarize the last answer in 3 bullets."}] }'
costMicros is measured in microdollars: 1,000,000 = $1.00. In session.usage, usage.costMicros is the direct LLM portion for that turn, while summary.costMicros is the total across all aggregated entries.
Use respond for request/response-style streaming per turn. Use /chat/:id/stream when you want
a long-lived session event feed with reconnect replay.
Open an SSE connection to receive real-time events as the agent works. Events are buffered server-side, so you can reconnect without missing anything.Buffered replay includes synthetic session.usage events emitted after completed turns, so reconnecting clients can recover billing data without calling a separate endpoint.
When the agent needs input during a turn (e.g., clarification, confirmation), it emits a question event with a requestID. Use this endpoint to send the reply.
await client.agent.v1.chat.reply(chat.id, requestId, { text: 'Yes, include the summary of all three depositions.',})
The requestID comes from the SSE question event. The agent blocks until the reply is received, then continues its turn.
Both message and respond enforce single-turn concurrency per session. If the agent is still processing a previous turn, the server returns 409 Conflict with details to help you retry:
409 Response
{ "error": { "message": "A turn is already active on this session", "code": "TURN_CONFLICT" }}
The response includes two headers:
Retry-After — suggested wait time in seconds before retrying
X-Active-Turn-Id — the ID of the currently active turn
Wait for the active turn to complete (via the stream or polling), then retry your message. Do not
cancel and immediately resend — the agent may still be writing tool outputs.
Use runs for fire-and-forget batch tasks. Use chat when you need back-and-forth
interaction with the agent or when the task requires iterative refinement.
Chat endpoints require an API key with agent:read (for streaming) and agent:write (for create, message, cancel, delete) permissions. Session-based or OAuth authentication is not supported — all downstream token usage and billing is attributed to the API key’s organization.
import Casedev from '@case.dev/sdk'const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY })// 1. Create a sessionconst chat = await client.agent.v1.chat.create({ title: 'Deposition Analysis', model: 'anthropic/claude-sonnet-4.6',})// 2. First messageawait client.agent.v1.chat.sendMessage(chat.id, { parts: [ { type: 'text', text: 'Search vault vault_depo for all witness testimony about the accident timeline.', }, ],})// 3. Follow-up based on resultsawait client.agent.v1.chat.sendMessage(chat.id, { parts: [ { type: 'text', text: 'Now cross-reference that with the police report in vault vault_evidence.', }, ],})// 4. End session when doneconst result = await client.agent.v1.chat.delete(chat.id)console.log(`Session cost: $${result.cost.toFixed(4)}`)