# Claude JSONL Format Specification

## File Format

- **Format:** Newline-delimited JSON (NDJSON/JSONL)
- **Encoding:** UTF-8
- **Line terminator:** `\n` (LF)
- **One JSON object per line** — no array wrapper

## Message Envelope (Common Fields)

Every line in a Claude JSONL file contains these fields:

```json
{
  "parentUuid": "uuid-string or null",
  "isSidechain": false,
  "userType": "external",
  "cwd": "/full/path/to/working/directory",
  "sessionId": "session-uuid-v4",
  "version": "2.1.20",
  "gitBranch": "branch-name or empty string",
  "type": "user|assistant|progress|system|summary|file-history-snapshot",
  "message": { ... },
  "uuid": "unique-message-uuid-v4",
  "timestamp": "ISO-8601 timestamp"
}
```

### Field Reference

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `type` | string | Yes | Message type identifier |
| `uuid` | string (uuid) | Yes* | Unique identifier for this event |
| `parentUuid` | string (uuid) or null | Yes | Links to parent message (null for root) |
| `timestamp` | string (ISO-8601) | Yes* | When event occurred (UTC) |
| `sessionId` | string (uuid) | Yes | Session identifier |
| `version` | string (semver) | Yes | Claude Code version (e.g., "2.1.20") |
| `cwd` | string (path) | Yes | Working directory at event time |
| `gitBranch` | string | No | Git branch name (empty if not in repo) |
| `isSidechain` | boolean | Yes | `true` for subagent sessions |
| `userType` | string | Yes | Always "external" for user sessions |
| `message` | object | Conditional | Message content (user/assistant types) |
| `agentId` | string | Conditional | Agent identifier (subagent sessions only) |

*May be null in metadata-only entries like `file-history-snapshot`

## Content Structure

### User Message Content

User messages have `message.content` as either:

**String (direct input):**
```json
{
  "message": {
    "role": "user",
    "content": "Your question or instruction"
  }
}
```

**Array (tool results):**
```json
{
  "message": {
    "role": "user",
    "content": [
      {
        "type": "tool_result",
        "tool_use_id": "toolu_01XYZ",
        "content": "Tool output text"
      }
    ]
  }
}
```

### Assistant Message Content

Assistant messages always have `message.content` as an **array**:

```json
{
  "message": {
    "role": "assistant",
    "type": "message",
    "model": "claude-opus-4-5-20251101",
    "id": "msg_bdrk_01Abc123",
    "content": [
      {"type": "thinking", "thinking": "..."},
      {"type": "text", "text": "..."},
      {"type": "tool_use", "id": "toolu_01XYZ", "name": "Read", "input": {...}}
    ],
    "stop_reason": "end_turn",
    "stop_sequence": null,
    "usage": {...}
  }
}
```

## Content Block Types

### Text Block
```json
{
  "type": "text",
  "text": "Response text content"
}
```

### Thinking Block
```json
{
  "type": "thinking",
  "thinking": "Internal reasoning (extended thinking mode)",
  "signature": "base64-signature (optional)"
}
```

### Tool Use Block
```json
{
  "type": "tool_use",
  "id": "toolu_01Abc123XYZ",
  "name": "ToolName",
  "input": {
    "param1": "value1",
    "param2": 123
  }
}
```

### Tool Result Block
```json
{
  "type": "tool_result",
  "tool_use_id": "toolu_01Abc123XYZ",
  "content": "Result text or structured output",
  "is_error": false
}
```

## Usage Object

Token consumption reported in assistant messages:

```json
{
  "usage": {
    "input_tokens": 1000,
    "output_tokens": 500,
    "cache_creation_input_tokens": 200,
    "cache_read_input_tokens": 400,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 200,
      "ephemeral_1h_input_tokens": 0
    },
    "service_tier": "standard"
  }
}
```

| Field | Type | Description |
|-------|------|-------------|
| `input_tokens` | int | Input tokens consumed |
| `output_tokens` | int | Output tokens generated |
| `cache_creation_input_tokens` | int | Tokens used to create cache |
| `cache_read_input_tokens` | int | Tokens read from cache |
| `service_tier` | string | API tier ("standard", etc.) |

## Model Identifiers

Common model names in `message.model`:

| Model | Identifier |
|-------|------------|
| Claude Opus 4.5 | `claude-opus-4-5-20251101` |
| Claude Sonnet 4.5 | `claude-sonnet-4-5-20241022` |
| Claude Haiku 4.5 | `claude-haiku-4-5-20251001` |

## Version History

| Version | Changes |
|---------|---------|
| 2.1.20 | Extended thinking, permission modes, todos |
| 2.1.17 | Subagent support with agentId |
| 2.1.x | Progress events, hook metadata |
| 2.0.x | Basic message/tool_use/tool_result |

## Conversation Graph

Messages form a DAG (directed acyclic graph) via parent-child relationships:

```
Root (parentUuid: null)
├── User message (uuid: A)
│   └── Assistant (uuid: B, parentUuid: A)
│       ├── Progress: Tool (uuid: C, parentUuid: A)
│       └── Progress: Hook (uuid: D, parentUuid: A)
└── User message (uuid: E, parentUuid: B)
    └── Assistant (uuid: F, parentUuid: E)
```

## Parsing Recommendations

1. **Line-by-line** — Don't load entire file into memory
2. **Skip invalid lines** — Wrap JSON.parse in try/catch
3. **Handle missing fields** — Check existence before access
4. **Ignore unknown types** — Format evolves with new event types
5. **Check content type** — User content can be string OR array
6. **Sum token variants** — Cache tokens may be in different fields