Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.case.dev/llms.txt

Use this file to discover all available pages before exploring further.

Chat sessions give you an interactive, multi-turn agent that stays alive between messages. Unlike runs (fire-and-forget batch jobs), a chat session keeps its sandbox running so the agent retains full context — files, environment, and conversation history — across every message you send.

Chat session lifecycle

Lifecycle
  create ──→ active ──→ idle (snapshot) ──→ resumed ──→ active
                │                                        │
                └──→ ended (delete)    ←─────────────────┘
StatusDescription
activeSandbox is running, ready for messages
idleSandbox snapshotted after idle timeout, restorable on next message
endedSession terminated, sandbox destroyed

Webhook events

Chat sessions also emit webhook events through the Case.dev Events API. Subscribe to these when your application needs to react to sandbox readiness, scope activation, or turn progress without racing the SSE stream:
EventUse when
agent.runtime.reusedAn existing sandbox was reused for a chat session
agent.scope.activatedThe sandbox has the requested matter or vault authority loaded
agent.worker.readyThe agent worker inside the sandbox is ready to accept messages
agent.chat.session.createdA new chat session was created
agent.chat.turn.startedA chat turn began executing
agent.chat.turn.completedA chat turn completed successfully
agent.chat.turn.failedA chat turn failed
agent.chat.turn.conflictA turn was rejected because another turn was already active
See Event Types for the full generated catalog and payload fields.

Step 1: Create a session

Endpoint
POST /agent/v1/chat
curl -X POST https://api.case.dev/agent/v1/chat \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Contract Review Session",
    "model": "anthropic/claude-sonnet-4.6",
    "idleTimeoutMs": 300000
  }'
Response
{
  "id": "chat_abc123",
  "status": "active",
  "idleTimeoutMs": 300000,
  "createdAt": "2026-03-03T21:23:18.434Z"
}

Create parameters

ParameterTypeRequiredDescription
titlestringnoHuman-readable session name
modelstringnoLLM model (default: anthropic/claude-sonnet-4.6)
idleTimeoutMsintegernoIdle time before snapshot eligibility (default: 15 min, min: 1 min, max: 24 hr)

Step 2: Send messages

Endpoint
POST /agent/v1/chat/:id/message
Messages are proxied to the agent running in the sandbox. The agent has the same full tool access as batch runs — vaults, legal research, OCR, web search, and the casedev CLI.
curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [{"type": "text", "text": "Search vault vault_abc for indemnification clauses."}]
  }'
The response contains the agent’s output plus a usage object when token data is available. usage.costMicros is the assistant turn’s LLM cost. usage.summary and usage.entries aggregate all Case.dev billable activity correlated to that turn, including downstream tool/API calls that happened under the session key.
Response excerpt
{
  "info": {
    "id": "msg_abc123",
    "role": "assistant"
  },
  "parts": [
    {
      "id": "part_abc123",
      "type": "text",
      "text": "Here is the summary..."
    }
  ],
  "usage": {
    "turnId": "2f4d75dc-6ea7-45ab-8010-c53fb4b776c6",
    "messageId": "msg_abc123",
    "idempotencyKey": "msg_abc123",
    "model": "anthropic/claude-sonnet-4.6",
    "totalInputTokens": 4200,
    "totalOutputTokens": 1800,
    "totalTokens": 6000,
    "costMicros": 42000,
    "summary": {
      "costMicros": 53000,
      "totalInputTokens": 4200,
      "totalOutputTokens": 1800,
      "totalTokens": 6000
    },
    "entries": [
      {
        "id": "usage_llm_123",
        "kind": "api",
        "service": "chat",
        "endpoint": "/llm/v1/chat/completions",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 42000,
        "promptTokens": 4200,
        "completionTokens": 1800,
        "totalTokens": 6000,
        "model": "anthropic/claude-sonnet-4.6",
        "timestamp": "2026-03-03T21:23:20.100Z",
        "metadata": {
          "cost": 0.042
        }
      },
      {
        "id": "usage_search_456",
        "kind": "api",
        "service": "search",
        "endpoint": "/search/v1/search",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 11000,
        "promptTokens": null,
        "completionTokens": null,
        "totalTokens": null,
        "model": null,
        "timestamp": "2026-03-03T21:23:20.800Z",
        "metadata": null
      }
    ]
  }
}
usage.entries[] is the audit log. usage.summary is the sum of those entries. For compatibility, the top-level usage.model, token counts, and usage.costMicros still reflect the assistant turn’s direct LLM usage.
If the sandbox was snapshotted due to idle timeout, sending a message automatically restores it. There is a brief resume delay (~5-10s) but no context is lost.

Step 2B: Stream a single turn with respond

Endpoint
POST /agent/v1/chat/:id/respond
Use respond when you want one request that both submits the user message and streams only the current assistant turn. respond returns a turn-scoped SSE stream with normalized events:
  • turn.started
  • turn.status
  • message.created
  • message.part.updated
  • message.completed
  • session.usage
  • turn.completed
It excludes historical replay and raw upstream session.* events, so your UI can render a clean, deterministic per-turn stream.
curl -N -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/respond" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [{"type": "text", "text": "Summarize the last answer in 3 bullets."}]
  }'
Example SSE events
event: turn.started
data: {"turnId":"turn_...","chatId":"chat_..."}

event: message.part.updated
data: {"turnId":"turn_...","messageId":"msg_...","partId":"part_...","text":"..."}

event: session.usage
data: {"type":"session.usage","properties":{"turnId":"turn_...","messageId":"msg_...","usage":{"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000,"costMicros":42000,"model":"anthropic/claude-sonnet-4.6"},"summary":{"costMicros":53000,"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000},"entries":[{"service":"chat","endpoint":"/llm/v1/chat/completions","costMicros":42000},{"service":"search","endpoint":"/search/v1/search","costMicros":11000}]}}

event: turn.completed
data: {"turnId":"turn_...","status":"completed"}
costMicros is measured in microdollars: 1,000,000 = $1.00. In session.usage, usage.costMicros is the direct LLM portion for that turn, while summary.costMicros is the total across all aggregated entries.
Use respond for request/response-style streaming per turn. Use /chat/:id/stream when you want a long-lived session event feed with reconnect replay.

Step 3: Stream events (optional)

Endpoint
GET /agent/v1/chat/:id/stream
Open an SSE connection to receive real-time events as the agent works. Events are buffered server-side, so you can reconnect without missing anything. Buffered replay includes synthetic session.usage events emitted after completed turns, so reconnecting clients can recover billing data without calling a separate endpoint.
curl -N "https://api.case.dev/agent/v1/chat/$CHAT_ID/stream" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

Replay from a sequence number

Each SSE event has a numeric id. Pass lastEventId to replay events after a given sequence — useful for reconnecting after a network drop:
cURL
# Replay events after sequence 42
curl -N "https://api.case.dev/agent/v1/chat/$CHAT_ID/stream?lastEventId=42" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"
The Last-Event-ID HTTP header is also supported, following the SSE spec.
Events are buffered up to 500 per session. For long-running sessions with high event volume, connect the stream early to avoid gaps.

Reply to agent questions

Endpoint
POST /agent/v1/chat/:id/question/:requestID/reply
When the agent needs input during a turn (e.g., clarification, confirmation), it emits a question event with a requestID. Use this endpoint to send the reply.
curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/question/$REQUEST_ID/reply" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Yes, include the summary of all three depositions."}'
The requestID comes from the SSE question event. The agent blocks until the reply is received, then continues its turn.

Turn conflict (409)

Both message and respond enforce single-turn concurrency per session. If the agent is still processing a previous turn, the server returns 409 Conflict with details to help you retry:
409 Response
{
  "error": {
    "message": "A turn is already active on this session",
    "code": "TURN_CONFLICT"
  }
}
The response includes two headers:
  • Retry-After — suggested wait time in seconds before retrying
  • X-Active-Turn-Id — the ID of the currently active turn
Wait for the active turn to complete (via the stream or polling), then retry your message. Do not cancel and immediately resend — the agent may still be writing tool outputs.

Cancel generation

Endpoint
POST /agent/v1/chat/:id/cancel
Abort the agent’s current generation without ending the session. The sandbox stays alive and you can send another message immediately.
curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/cancel" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

End the session

Endpoint
DELETE /agent/v1/chat/:id
Snapshots the sandbox, terminates it, and marks the session as ended. The response includes runtime billing data.
curl -X DELETE "https://api.case.dev/agent/v1/chat/$CHAT_ID" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"
Response
{
  "id": "chat_abc123",
  "status": "ended",
  "snapshotImageId": "im-abc123",
  "runtimeMs": 48230,
  "cost": 0.00268
}
FieldTypeDescription
statusstringAlways "ended"
snapshotImageIdstringFinal sandbox snapshot (nullable)
runtimeMsintegerTotal sandbox uptime in milliseconds
costnumberRuntime cost in USD ($0.20/hr)
Sending a message to an ended session returns 409 Conflict. Create a new session to continue.

Idle timeout and snapshots

Chat sessions have a configurable idle timeout (default: 15 minutes). When no messages are sent within the timeout window:
  1. The sandbox is snapshotted (memory + filesystem persisted)
  2. The sandbox is terminated to stop billing
  3. The next message automatically restores the sandbox from the snapshot
This means you only pay for active compute time, not idle wait. A background reaper runs every 5 minutes to clean up idle sessions.

Runs vs. chat

RunsChat
PatternSingle prompt in, result outMulti-turn conversation
Sandbox lifetimeOne executionPersists across messages
StreamingPoll or webhookReal-time SSE
ContextFresh each runRetained across turns
BillingPer-executionPer-second of sandbox uptime
Best forBatch processing, scheduled tasksInteractive workflows, iterative analysis
Use runs for fire-and-forget batch tasks. Use chat when you need back-and-forth interaction with the agent or when the task requires iterative refinement.

Authentication

Chat endpoints require an API key with agent:read (for streaming) and agent:write (for create, message, cancel, delete) permissions. Session-based or OAuth authentication is not supported — all downstream token usage and billing is attributed to the API key’s organization.

Complete example

# 1. Create session
CHAT=$(curl -s -X POST https://api.case.dev/agent/v1/chat \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"title":"Deposition Analysis","model":"anthropic/claude-sonnet-4.6"}')
CHAT_ID=$(echo $CHAT | jq -r '.id')

# 2. First message
curl -s -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"parts":[{"type":"text","text":"Search vault vault_depo for all witness testimony about the accident timeline."}]}'

# 3. Follow-up
curl -s -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"parts":[{"type":"text","text":"Now cross-reference that with the police report in vault vault_evidence."}]}'

# 4. End session
curl -X DELETE "https://api.case.dev/agent/v1/chat/$CHAT_ID" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"