# Claude JSONL Format Specification ## File Format - **Format:** Newline-delimited JSON (NDJSON/JSONL) - **Encoding:** UTF-8 - **Line terminator:** `\n` (LF) - **One JSON object per line** — no array wrapper ## Message Envelope (Common Fields) Every line in a Claude JSONL file contains these fields: ```json { "parentUuid": "uuid-string or null", "isSidechain": false, "userType": "external", "cwd": "/full/path/to/working/directory", "sessionId": "session-uuid-v4", "version": "2.1.20", "gitBranch": "branch-name or empty string", "type": "user|assistant|progress|system|summary|file-history-snapshot", "message": { ... }, "uuid": "unique-message-uuid-v4", "timestamp": "ISO-8601 timestamp" } ``` ### Field Reference | Field | Type | Required | Description | |-------|------|----------|-------------| | `type` | string | Yes | Message type identifier | | `uuid` | string (uuid) | Yes* | Unique identifier for this event | | `parentUuid` | string (uuid) or null | Yes | Links to parent message (null for root) | | `timestamp` | string (ISO-8601) | Yes* | When event occurred (UTC) | | `sessionId` | string (uuid) | Yes | Session identifier | | `version` | string (semver) | Yes | Claude Code version (e.g., "2.1.20") | | `cwd` | string (path) | Yes | Working directory at event time | | `gitBranch` | string | No | Git branch name (empty if not in repo) | | `isSidechain` | boolean | Yes | `true` for subagent sessions | | `userType` | string | Yes | Always "external" for user sessions | | `message` | object | Conditional | Message content (user/assistant types) | | `agentId` | string | Conditional | Agent identifier (subagent sessions only) | *May be null in metadata-only entries like `file-history-snapshot` ## Content Structure ### User Message Content User messages have `message.content` as either: **String (direct input):** ```json { "message": { "role": "user", "content": "Your question or instruction" } } ``` **Array (tool results):** ```json { "message": { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "toolu_01XYZ", "content": "Tool output text" } ] } } ``` ### Assistant Message Content Assistant messages always have `message.content` as an **array**: ```json { "message": { "role": "assistant", "type": "message", "model": "claude-opus-4-5-20251101", "id": "msg_bdrk_01Abc123", "content": [ {"type": "thinking", "thinking": "..."}, {"type": "text", "text": "..."}, {"type": "tool_use", "id": "toolu_01XYZ", "name": "Read", "input": {...}} ], "stop_reason": "end_turn", "stop_sequence": null, "usage": {...} } } ``` ## Content Block Types ### Text Block ```json { "type": "text", "text": "Response text content" } ``` ### Thinking Block ```json { "type": "thinking", "thinking": "Internal reasoning (extended thinking mode)", "signature": "base64-signature (optional)" } ``` ### Tool Use Block ```json { "type": "tool_use", "id": "toolu_01Abc123XYZ", "name": "ToolName", "input": { "param1": "value1", "param2": 123 } } ``` ### Tool Result Block ```json { "type": "tool_result", "tool_use_id": "toolu_01Abc123XYZ", "content": "Result text or structured output", "is_error": false } ``` ## Usage Object Token consumption reported in assistant messages: ```json { "usage": { "input_tokens": 1000, "output_tokens": 500, "cache_creation_input_tokens": 200, "cache_read_input_tokens": 400, "cache_creation": { "ephemeral_5m_input_tokens": 200, "ephemeral_1h_input_tokens": 0 }, "service_tier": "standard" } } ``` | Field | Type | Description | |-------|------|-------------| | `input_tokens` | int | Input tokens consumed | | `output_tokens` | int | Output tokens generated | | `cache_creation_input_tokens` | int | Tokens used to create cache | | `cache_read_input_tokens` | int | Tokens read from cache | | `service_tier` | string | API tier ("standard", etc.) | ## Model Identifiers Common model names in `message.model`: | Model | Identifier | |-------|------------| | Claude Opus 4.5 | `claude-opus-4-5-20251101` | | Claude Sonnet 4.5 | `claude-sonnet-4-5-20241022` | | Claude Haiku 4.5 | `claude-haiku-4-5-20251001` | ## Version History | Version | Changes | |---------|---------| | 2.1.20 | Extended thinking, permission modes, todos | | 2.1.17 | Subagent support with agentId | | 2.1.x | Progress events, hook metadata | | 2.0.x | Basic message/tool_use/tool_result | ## Conversation Graph Messages form a DAG (directed acyclic graph) via parent-child relationships: ``` Root (parentUuid: null) ├── User message (uuid: A) │ └── Assistant (uuid: B, parentUuid: A) │ ├── Progress: Tool (uuid: C, parentUuid: A) │ └── Progress: Hook (uuid: D, parentUuid: A) └── User message (uuid: E, parentUuid: B) └── Assistant (uuid: F, parentUuid: E) ``` ## Parsing Recommendations 1. **Line-by-line** — Don't load entire file into memory 2. **Skip invalid lines** — Wrap JSON.parse in try/catch 3. **Handle missing fields** — Check existence before access 4. **Ignore unknown types** — Format evolves with new event types 5. **Check content type** — User content can be string OR array 6. **Sum token variants** — Cache tokens may be in different fields