Interactive Chat

Chat sessions give you an interactive, multi-turn agent that stays alive between messages. Unlike runs (fire-and-forget batch jobs), a chat session keeps its sandbox running so the agent retains full context — files, environment, and conversation history — across every message you send.

Chat session lifecycle

Lifecycle

  create ──→ active ──→ idle (snapshot) ──→ resumed ──→ active
                │                                        │
                └──→ ended (delete)    ←─────────────────┘

Status	Description
`active`	Sandbox is running, ready for messages
`idle`	Sandbox snapshotted after idle timeout, restorable on next message
`ended`	Session terminated, sandbox destroyed

Webhook events

Chat sessions also emit webhook events through the Case.dev Events API. Subscribe to these when your application needs to react to sandbox readiness, scope activation, or turn progress without racing the SSE stream:

Event	Use when
`agent.runtime.reused`	An existing sandbox was reused for a chat session
`agent.scope.activated`	The sandbox has the requested matter or vault authority loaded
`agent.worker.ready`	The agent worker inside the sandbox is ready to accept messages
`agent.chat.session.created`	A new chat session was created
`agent.chat.turn.started`	A chat turn began executing
`agent.chat.turn.completed`	A chat turn completed successfully
`agent.chat.turn.failed`	A chat turn failed
`agent.chat.turn.conflict`	A turn was rejected because another turn was already active

See Event Types for the full generated catalog and payload fields.

Step 1: Create a session

Endpoint

POST /agent/v1/chat

curl -X POST https://api.case.dev/agent/v1/chat \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Contract Review Session",
    "model": "anthropic/claude-sonnet-4.6",
    "idleTimeoutMs": 300000
  }'

Response

{
  "id": "chat_abc123",
  "status": "active",
  "idleTimeoutMs": 300000,
  "createdAt": "2026-03-03T21:23:18.434Z"
}

Create parameters

Parameter	Type	Required	Description
`title`	string	no	Human-readable session name
`model`	string	no	LLM model (default: `anthropic/claude-sonnet-4.6`)
`idleTimeoutMs`	integer	no	Idle time before snapshot eligibility (default: 15 min, min: 1 min, max: 24 hr)

Step 2: Send messages

Endpoint

POST /agent/v1/chat/:id/message

Messages are proxied to the agent running in the sandbox. The agent has the same full tool access as batch runs — vaults, legal research, OCR, web search, and the casedev CLI.

curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [{"type": "text", "text": "Search vault vault_abc for indemnification clauses."}]
  }'

The response contains the agent’s output plus a usage object when token data is available. usage.costMicros is the assistant turn’s LLM cost. usage.summary and usage.entries aggregate all Case.dev billable activity correlated to that turn, including downstream tool/API calls that happened under the session key.

Response excerpt

{
  "info": {
    "id": "msg_abc123",
    "role": "assistant"
  },
  "parts": [
    {
      "id": "part_abc123",
      "type": "text",
      "text": "Here is the summary..."
    }
  ],
  "usage": {
    "turnId": "2f4d75dc-6ea7-45ab-8010-c53fb4b776c6",
    "messageId": "msg_abc123",
    "idempotencyKey": "msg_abc123",
    "model": "anthropic/claude-sonnet-4.6",
    "totalInputTokens": 4200,
    "totalOutputTokens": 1800,
    "totalTokens": 6000,
    "costMicros": 42000,
    "summary": {
      "costMicros": 53000,
      "totalInputTokens": 4200,
      "totalOutputTokens": 1800,
      "totalTokens": 6000
    },
    "entries": [
      {
        "id": "usage_llm_123",
        "kind": "api",
        "service": "chat",
        "endpoint": "/llm/v1/chat/completions",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 42000,
        "promptTokens": 4200,
        "completionTokens": 1800,
        "totalTokens": 6000,
        "model": "anthropic/claude-sonnet-4.6",
        "timestamp": "2026-03-03T21:23:20.100Z",
        "metadata": {
          "cost": 0.042
        }
      },
      {
        "id": "usage_search_456",
        "kind": "api",
        "service": "search",
        "endpoint": "/search/v1/search",
        "method": "POST",
        "statusCode": 200,
        "costMicros": 11000,
        "promptTokens": null,
        "completionTokens": null,
        "totalTokens": null,
        "model": null,
        "timestamp": "2026-03-03T21:23:20.800Z",
        "metadata": null
      }
    ]
  }
}

usage.entries[] is the audit log. usage.summary is the sum of those entries. For compatibility, the top-level usage.model, token counts, and usage.costMicros still reflect the assistant turn’s direct LLM usage.

If the sandbox was snapshotted due to idle timeout, sending a message automatically restores it. There is a brief resume delay (~5-10s) but no context is lost.

Step 2B: Stream a single turn with `respond`

Endpoint

POST /agent/v1/chat/:id/respond

Use respond when you want one request that both submits the user message and streams only the current assistant turn. respond returns a turn-scoped SSE stream with normalized events:

turn.started
turn.status
message.created
message.part.updated
message.completed
session.usage
turn.completed

It excludes historical replay and raw upstream session.* events, so your UI can render a clean, deterministic per-turn stream.

curl -N -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/respond" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [{"type": "text", "text": "Summarize the last answer in 3 bullets."}]
  }'

Example SSE events

event: turn.started
data: {"turnId":"turn_...","chatId":"chat_..."}

event: message.part.updated
data: {"turnId":"turn_...","messageId":"msg_...","partId":"part_...","text":"..."}

event: session.usage
data: {"type":"session.usage","properties":{"turnId":"turn_...","messageId":"msg_...","usage":{"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000,"costMicros":42000,"model":"anthropic/claude-sonnet-4.6"},"summary":{"costMicros":53000,"totalInputTokens":4200,"totalOutputTokens":1800,"totalTokens":6000},"entries":[{"service":"chat","endpoint":"/llm/v1/chat/completions","costMicros":42000},{"service":"search","endpoint":"/search/v1/search","costMicros":11000}]}}

event: turn.completed
data: {"turnId":"turn_...","status":"completed"}

costMicros is measured in microdollars: 1,000,000 = $1.00. In session.usage, usage.costMicros is the direct LLM portion for that turn, while summary.costMicros is the total across all aggregated entries.

Use respond for request/response-style streaming per turn. Use /chat/:id/stream when you want a long-lived session event feed with reconnect replay.

Step 3: Stream events (optional)

Endpoint

GET /agent/v1/chat/:id/stream

Open an SSE connection to receive real-time events as the agent works. Events are buffered server-side, so you can reconnect without missing anything. Buffered replay includes synthetic session.usage events emitted after completed turns, so reconnecting clients can recover billing data without calling a separate endpoint.

curl -N "https://api.case.dev/agent/v1/chat/$CHAT_ID/stream" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

Replay from a sequence number

Each SSE event has a numeric id. Pass lastEventId to replay events after a given sequence — useful for reconnecting after a network drop:

cURL

# Replay events after sequence 42
curl -N "https://api.case.dev/agent/v1/chat/$CHAT_ID/stream?lastEventId=42" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

The Last-Event-ID HTTP header is also supported, following the SSE spec.

Events are buffered up to 500 per session. For long-running sessions with high event volume, connect the stream early to avoid gaps.

Reply to agent questions

Endpoint

POST /agent/v1/chat/:id/question/:requestID/reply

When the agent needs input during a turn (e.g., clarification, confirmation), it emits a question event with a requestID. Use this endpoint to send the reply.

curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/question/$REQUEST_ID/reply" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Yes, include the summary of all three depositions."}'

The requestID comes from the SSE question event. The agent blocks until the reply is received, then continues its turn.

Turn conflict (409)

Both message and respond enforce single-turn concurrency per session. If the agent is still processing a previous turn, the server returns 409 Conflict with details to help you retry:

409 Response

{
  "error": {
    "message": "A turn is already active on this session",
    "code": "TURN_CONFLICT"
  }
}

The response includes two headers:

Retry-After — suggested wait time in seconds before retrying
X-Active-Turn-Id — the ID of the currently active turn

Wait for the active turn to complete (via the stream or polling), then retry your message. Do not cancel and immediately resend — the agent may still be writing tool outputs.

Cancel generation

Endpoint

POST /agent/v1/chat/:id/cancel

Abort the agent’s current generation without ending the session. The sandbox stays alive and you can send another message immediately.

curl -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/cancel" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

End the session

Endpoint

DELETE /agent/v1/chat/:id

Snapshots the sandbox, terminates it, and marks the session as ended. The response includes runtime billing data.

curl -X DELETE "https://api.case.dev/agent/v1/chat/$CHAT_ID" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

Response

{
  "id": "chat_abc123",
  "status": "ended",
  "snapshotImageId": "im-abc123",
  "runtimeMs": 48230,
  "cost": 0.00268
}

Field	Type	Description
`status`	string	Always `"ended"`
`snapshotImageId`	string	Final sandbox snapshot (nullable)
`runtimeMs`	integer	Total sandbox uptime in milliseconds
`cost`	number	Runtime cost in USD ($0.20/hr)

Sending a message to an ended session returns 409 Conflict. Create a new session to continue.

Idle timeout and snapshots

Chat sessions have a configurable idle timeout (default: 15 minutes). When no messages are sent within the timeout window:

The sandbox is snapshotted (memory + filesystem persisted)
The sandbox is terminated to stop billing
The next message automatically restores the sandbox from the snapshot

This means you only pay for active compute time, not idle wait. A background reaper runs every 5 minutes to clean up idle sessions.

Runs vs. chat

	Runs	Chat
Pattern	Single prompt in, result out	Multi-turn conversation
Sandbox lifetime	One execution	Persists across messages
Streaming	Poll or webhook	Real-time SSE
Context	Fresh each run	Retained across turns
Billing	Per-execution	Per-second of sandbox uptime
Best for	Batch processing, scheduled tasks	Interactive workflows, iterative analysis

Use runs for fire-and-forget batch tasks. Use chat when you need back-and-forth interaction with the agent or when the task requires iterative refinement.

Authentication

Chat endpoints require an API key with agent:read (for streaming) and agent:write (for create, message, cancel, delete) permissions. Session-based or OAuth authentication is not supported — all downstream token usage and billing is attributed to the API key’s organization.

Complete example

# 1. Create session
CHAT=$(curl -s -X POST https://api.case.dev/agent/v1/chat \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"title":"Deposition Analysis","model":"anthropic/claude-sonnet-4.6"}')
CHAT_ID=$(echo $CHAT | jq -r '.id')

# 2. First message
curl -s -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"parts":[{"type":"text","text":"Search vault vault_depo for all witness testimony about the accident timeline."}]}'

# 3. Follow-up
curl -s -X POST "https://api.case.dev/agent/v1/chat/$CHAT_ID/message" \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"parts":[{"type":"text","text":"Now cross-reference that with the police report in vault vault_evidence."}]}'

# 4. End session
curl -X DELETE "https://api.case.dev/agent/v1/chat/$CHAT_ID" \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

Get Started

Platform

Resources

Chat session lifecycle

Webhook events

Step 1: Create a session

Create parameters

Step 2: Send messages

Step 2B: Stream a single turn with `respond`

Step 3: Stream events (optional)

Replay from a sequence number

Reply to agent questions

Turn conflict (409)

Cancel generation

End the session

Idle timeout and snapshots

Runs vs. chat

Authentication

Complete example

Get Started

Platform

Resources

Documentation Index

​Chat session lifecycle

​Webhook events

​Step 1: Create a session

​Create parameters

​Step 2: Send messages

​Step 2B: Stream a single turn with respond

​Step 3: Stream events (optional)

​Replay from a sequence number

​Reply to agent questions

​Turn conflict (409)

​Cancel generation

​End the session

​Idle timeout and snapshots

​Runs vs. chat

​Authentication

​Complete example

Chat session lifecycle

Webhook events

Step 1: Create a session

Create parameters

Step 2: Send messages

Step 2B: Stream a single turn with `respond`

Step 3: Stream events (optional)

Replay from a sequence number

Reply to agent questions

Turn conflict (409)

Cancel generation

End the session

Idle timeout and snapshots

Runs vs. chat

Authentication

Complete example