Use this file to discover all available pages before exploring further.
Chat sessions give you an interactive, multi-turn agent that stays alive between messages. Unlike runs (fire-and-forget batch jobs), a chat session keeps its sandbox running so the agent retains full context — files, environment, and conversation history — across every message you send.
Chat sessions also emit webhook events through the Case.dev Events API. Subscribe to these when your
application needs to react to sandbox readiness, scope activation, or turn progress without racing
the SSE stream:
Event
Use when
agent.runtime.reused
An existing sandbox was reused for a chat session
agent.scope.activated
The sandbox has the requested matter or vault authority loaded
agent.worker.ready
The agent worker inside the sandbox is ready to accept messages
agent.chat.session.created
A new chat session was created
agent.chat.turn.started
A chat turn began executing
agent.chat.turn.completed
A chat turn completed successfully
agent.chat.turn.failed
A chat turn failed
agent.chat.turn.conflict
A turn was rejected because another turn was already active
See Event Types for the full generated catalog and payload fields.
Messages are proxied to the agent running in the sandbox. The agent has the same full tool access as batch runs — vaults, legal research, OCR, web search, and the casedev CLI.
The response contains the agent’s output plus a usage object when token data is available. usage.costMicros is the assistant turn’s LLM cost. usage.summary and usage.entries aggregate all Case.dev billable activity correlated to that turn, including downstream tool/API calls that happened under the session key.
usage.entries[] is the audit log. usage.summary is the sum of those entries. For compatibility, the top-level usage.model, token counts, and usage.costMicros still reflect the assistant turn’s direct LLM usage.
If the sandbox was snapshotted due to idle timeout, sending a message automatically restores it.
There is a brief resume delay (~5-10s) but no context is lost.
Use respond when you want one request that both submits the user message and streams only the current assistant turn.respond returns a turn-scoped SSE stream with normalized events:
turn.started
turn.status
message.created
message.part.updated
message.completed
session.usage
turn.completed
It excludes historical replay and raw upstream session.* events, so your UI can render a clean, deterministic per-turn stream.
curl -N -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/respond" \ -H "Authorization: Bearer $CASEDEV_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "parts": [{"type": "text", "text": "Summarize the last answer in 3 bullets."}] }'
costMicros is measured in microdollars: 1,000,000 = $1.00. In session.usage, usage.costMicros is the direct LLM portion for that turn, while summary.costMicros is the total across all aggregated entries.
Use respond for request/response-style streaming per turn. Use /chat/:id/stream when you want
a long-lived session event feed with reconnect replay.
Open an SSE connection to receive real-time events as the agent works. Events are buffered server-side, so you can reconnect without missing anything.Buffered replay includes synthetic session.usage events emitted after completed turns, so reconnecting clients can recover billing data without calling a separate endpoint.
When the agent needs input during a turn (e.g., clarification, confirmation), it emits a question event with a requestID. Use this endpoint to send the reply.
curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/question/$REQUEST_ID/reply" \ -H "Authorization: Bearer $CASEDEV_API_KEY" \ -H "Content-Type: application/json" \ -d '{"text": "Yes, include the summary of all three depositions."}'
The requestID comes from the SSE question event. The agent blocks until the reply is received, then continues its turn.
Both message and respond enforce single-turn concurrency per session. If the agent is still processing a previous turn, the server returns 409 Conflict with details to help you retry:
409 Response
{ "error": { "message": "A turn is already active on this session", "code": "TURN_CONFLICT" }}
The response includes two headers:
Retry-After — suggested wait time in seconds before retrying
X-Active-Turn-Id — the ID of the currently active turn
Wait for the active turn to complete (via the stream or polling), then retry your message. Do not
cancel and immediately resend — the agent may still be writing tool outputs.
Use runs for fire-and-forget batch tasks. Use chat when you need back-and-forth
interaction with the agent or when the task requires iterative refinement.
Chat endpoints require an API key with agent:read (for streaming) and agent:write (for create, message, cancel, delete) permissions. Session-based or OAuth authentication is not supported — all downstream token usage and billing is attributed to the API key’s organization.