Skip to main content
Chat sessions give you an interactive, multi-turn agent that stays alive between messages. Unlike runs (fire-and-forget batch jobs), a chat session keeps its sandbox running so the agent retains full context — files, environment, and conversation history — across every message you send.

Chat session lifecycle

Lifecycle
  create ──→ active ──→ idle (snapshot) ──→ resumed ──→ active
                │                                        │
                └──→ ended (delete)    ←─────────────────┘
StatusDescription
activeSandbox is running, ready for messages
idleSandbox snapshotted after idle timeout, restorable on next message
endedSession terminated, sandbox destroyed

Step 1: Create a session

Endpoint
POST /agent/v1/chat
const chat = await client.agent.v1.chat.create({
  title: 'Contract Review Session',
  model: 'anthropic/claude-sonnet-4.6',
  idleTimeoutMs: 300000, // 5 minutes
})

console.log(chat.id) // chat_xxx
console.log(chat.status) // "active"
Response
{
  "id": "chat_abc123",
  "status": "active",
  "idleTimeoutMs": 300000,
  "createdAt": "2026-03-03T21:23:18.434Z"
}

Create parameters

ParameterTypeRequiredDescription
titlestringnoHuman-readable session name
modelstringnoLLM model (default: anthropic/claude-sonnet-4.6)
idleTimeoutMsintegernoIdle time before snapshot eligibility (default: 15 min, min: 1 min, max: 24 hr)

Step 2: Send messages

Endpoint
POST /agent/v1/chat/:id/message
Messages are proxied to the agent running in the sandbox. The agent has the same full tool access as batch runs — vaults, legal research, OCR, web search, and the casedev CLI.
const response = await client.agent.v1.chat.sendMessage(chat.id, {
  parts: [{ type: 'text', text: 'Search vault vault_abc for indemnification clauses.' }],
})
The response contains the agent’s output plus a usage object when token data is available. usage.costMicros is the assistant turn’s LLM cost. usage.summary and usage.entries aggregate all Case.dev billable activity correlated to that turn, including downstream tool/API calls that happened under the session key.
Response excerpt
{
  "info": {
    "id": "msg_abc123",
    "role": "assistant"
  },
  "parts": [
    {
      "id": "part_abc123",
      "type": "text",
      "text": "Here is the summary..."
    }
  ],
  "usage": {
    "turnId": "2f4d75dc-6ea7-45ab-8010-c53fb4b776c6",
    "messageId": "msg_abc123",
    "idempotencyKey": "msg_abc123",
    "model": "anthropic/claude-sonnet-4.6",
    "totalInputTokens": 4200,
    "totalOutputTokens": 1800,
    "totalTokens": 6000,
    "costMicros": 42000,
    "summary": {
      "costMicros": 53000,
      "totalInputTokens": 4200,
      "totalOutputTokens": 1800,
      "totalTokens": 6000
    },
    "entries": [
      {
        "id": "usage_llm_123",
        "kind": "api",
        "service": "chat",
        "endpoint": "/llm/v1/chat/completions",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 42000,
        "promptTokens": 4200,
        "completionTokens": 1800,
        "totalTokens": 6000,
        "model": "anthropic/claude-sonnet-4.6",
        "timestamp": "2026-03-03T21:23:20.100Z",
        "metadata": {
          "cost": 0.042
        }
      },
      {
        "id": "usage_search_456",
        "kind": "api",
        "service": "search",
        "endpoint": "/search/v1/search",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 11000,
        "promptTokens": null,
        "completionTokens": null,
        "totalTokens": null,
        "model": null,
        "timestamp": "2026-03-03T21:23:20.800Z",
        "metadata": null
      }
    ]
  }
}
usage.entries[] is the audit log. usage.summary is the sum of those entries. For compatibility, the top-level usage.model, token counts, and usage.costMicros still reflect the assistant turn’s direct LLM usage.
If the sandbox was snapshotted due to idle timeout, sending a message automatically restores it. There is a brief resume delay (~5-10s) but no context is lost.

Step 2B: Stream a single turn with respond

Endpoint
POST /agent/v1/chat/:id/respond
Use respond when you want one request that both submits the user message and streams only the current assistant turn. respond returns a turn-scoped SSE stream with normalized events:
  • turn.started
  • turn.status
  • message.created
  • message.part.updated
  • message.completed
  • session.usage
  • turn.completed
It excludes historical replay and raw upstream session.* events, so your UI can render a clean, deterministic per-turn stream.
curl -N -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/respond" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [{"type": "text", "text": "Summarize the last answer in 3 bullets."}]
  }'
Example SSE events
event: turn.started
data: {"turnId":"turn_...","chatId":"chat_..."}

event: message.part.updated
data: {"turnId":"turn_...","messageId":"msg_...","partId":"part_...","text":"..."}

event: session.usage
data: {"type":"session.usage","properties":{"turnId":"turn_...","messageId":"msg_...","usage":{"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000,"costMicros":42000,"model":"anthropic/claude-sonnet-4.6"},"summary":{"costMicros":53000,"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000},"entries":[{"service":"chat","endpoint":"/llm/v1/chat/completions","costMicros":42000},{"service":"search","endpoint":"/search/v1/search","costMicros":11000}]}}

event: turn.completed
data: {"turnId":"turn_...","status":"completed"}
costMicros is measured in microdollars: 1,000,000 = $1.00. In session.usage, usage.costMicros is the direct LLM portion for that turn, while summary.costMicros is the total across all aggregated entries.
Use respond for request/response-style streaming per turn. Use /chat/:id/stream when you want a long-lived session event feed with reconnect replay.

Step 3: Stream events (optional)

Endpoint
GET /agent/v1/chat/:id/stream
Open an SSE connection to receive real-time events as the agent works. Events are buffered server-side, so you can reconnect without missing anything. Buffered replay includes synthetic session.usage events emitted after completed turns, so reconnecting clients can recover billing data without calling a separate endpoint.
const stream = client.agent.v1.chat.stream(chat.id)

for await (const event of stream) {
  console.log(event.type, event.data)
}

Replay from a sequence number

Each SSE event has a numeric id. Pass lastEventId to replay events after a given sequence — useful for reconnecting after a network drop:
cURL
# Replay events after sequence 42
curl -N "https://api.case.dev/agent/v1/chat/$CHAT_ID/stream?lastEventId=42" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"
The Last-Event-ID HTTP header is also supported, following the SSE spec.
Events are buffered up to 500 per session. For long-running sessions with high event volume, connect the stream early to avoid gaps.

Reply to agent questions

Endpoint
POST /agent/v1/chat/:id/question/:requestID/reply
When the agent needs input during a turn (e.g., clarification, confirmation), it emits a question event with a requestID. Use this endpoint to send the reply.
await client.agent.v1.chat.reply(chat.id, requestId, {
  text: 'Yes, include the summary of all three depositions.',
})
The requestID comes from the SSE question event. The agent blocks until the reply is received, then continues its turn.

Turn conflict (409)

Both message and respond enforce single-turn concurrency per session. If the agent is still processing a previous turn, the server returns 409 Conflict with details to help you retry:
409 Response
{
  "error": {
    "message": "A turn is already active on this session",
    "code": "TURN_CONFLICT"
  }
}
The response includes two headers:
  • Retry-After — suggested wait time in seconds before retrying
  • X-Active-Turn-Id — the ID of the currently active turn
Wait for the active turn to complete (via the stream or polling), then retry your message. Do not cancel and immediately resend — the agent may still be writing tool outputs.

Cancel generation

Endpoint
POST /agent/v1/chat/:id/cancel
Abort the agent’s current generation without ending the session. The sandbox stays alive and you can send another message immediately.
const result = await client.agent.v1.chat.cancel(chat.id)
console.log(result.ok) // true

End the session

Endpoint
DELETE /agent/v1/chat/:id
Snapshots the sandbox, terminates it, and marks the session as ended. The response includes runtime billing data.
const result = await client.agent.v1.chat.delete(chat.id)
console.log(result.status) // "ended"
console.log(result.runtimeMs) // 48230
console.log(result.cost) // 0.00268
Response
{
  "id": "chat_abc123",
  "status": "ended",
  "snapshotImageId": "im-abc123",
  "runtimeMs": 48230,
  "cost": 0.00268
}
FieldTypeDescription
statusstringAlways "ended"
snapshotImageIdstringFinal sandbox snapshot (nullable)
runtimeMsintegerTotal sandbox uptime in milliseconds
costnumberRuntime cost in USD ($0.20/hr)
Sending a message to an ended session returns 409 Conflict. Create a new session to continue.

Idle timeout and snapshots

Chat sessions have a configurable idle timeout (default: 15 minutes). When no messages are sent within the timeout window:
  1. The sandbox is snapshotted (memory + filesystem persisted)
  2. The sandbox is terminated to stop billing
  3. The next message automatically restores the sandbox from the snapshot
This means you only pay for active compute time, not idle wait. A background reaper runs every 5 minutes to clean up idle sessions.

Runs vs. chat

RunsChat
PatternSingle prompt in, result outMulti-turn conversation
Sandbox lifetimeOne executionPersists across messages
StreamingPoll or webhookReal-time SSE
ContextFresh each runRetained across turns
BillingPer-executionPer-second of sandbox uptime
Best forBatch processing, scheduled tasksInteractive workflows, iterative analysis
Use runs for fire-and-forget batch tasks. Use chat when you need back-and-forth interaction with the agent or when the task requires iterative refinement.

Authentication

Chat endpoints require an API key with agent:read (for streaming) and agent:write (for create, message, cancel, delete) permissions. Session-based or OAuth authentication is not supported — all downstream token usage and billing is attributed to the API key’s organization.

Complete example

import Casedev from '@case.dev/sdk'

const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY })

// 1. Create a session
const chat = await client.agent.v1.chat.create({
  title: 'Deposition Analysis',
  model: 'anthropic/claude-sonnet-4.6',
})

// 2. First message
await client.agent.v1.chat.sendMessage(chat.id, {
  parts: [
    {
      type: 'text',
      text: 'Search vault vault_depo for all witness testimony about the accident timeline.',
    },
  ],
})

// 3. Follow-up based on results
await client.agent.v1.chat.sendMessage(chat.id, {
  parts: [
    {
      type: 'text',
      text: 'Now cross-reference that with the police report in vault vault_evidence.',
    },
  ],
})

// 4. End session when done
const result = await client.agent.v1.chat.delete(chat.id)
console.log(`Session cost: $${result.cost.toFixed(4)}`)