OpenClaw Reference (Mirrored)

Transcript Hygiene (Provider Fixups)

Mirrored from OpenClaw (MIT)
This mirror is provided for convenience. OpenClawdBots is not affiliated with or endorsed by OpenClaw.

Transcript Hygiene (Provider Fixups)

This document describes provider-specific fixes applied to transcripts before a run (building model context). These are in-memory adjustments used to satisfy strict provider requirements. These hygiene steps do not rewrite the stored JSONL transcript on disk; however, a separate session-file repair pass may rewrite malformed JSONL files by dropping invalid lines before the session is loaded. When a repair occurs, the original file is backed up alongside the session file.

Scope includes:

  • Tool call id sanitization
  • Tool call input validation
  • Tool result pairing repair
  • Turn validation / ordering
  • Thought signature cleanup
  • Image payload sanitization
  • User-input provenance tagging (for inter-session routed prompts)

If you need transcript storage details, see:


Where this runs

All transcript hygiene is centralized in the embedded runner:

  • Policy selection: src/agents/transcript-policy.ts
  • Sanitization/repair application: sanitizeSessionHistory in src/agents/pi-embedded-runner/google.ts

The policy uses provider, modelApi, and modelId to decide what to apply.

Separate from transcript hygiene, session files are repaired (if needed) before load:

  • repairSessionFileIfNeeded in src/agents/session-file-repair.ts
  • Called from run/attempt.ts and compact.ts (embedded runner)

Global rule: image sanitization

Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images).

This also helps control image-driven token pressure for vision-capable models. Lower max dimensions generally reduce token usage; higher dimensions preserve detail.

Implementation:

  • sanitizeSessionMessagesImages in src/agents/pi-embedded-helpers/images.ts
  • sanitizeContentBlocksImages in src/agents/tool-images.ts
  • Max image side is configurable via agents.defaults.imageMaxDimensionPx (default: 1200).

Global rule: malformed tool calls

Assistant tool-call blocks that are missing both input and arguments are dropped before model context is built. This prevents provider rejections from partially persisted tool calls (for example, after a rate limit failure).

Implementation:

  • sanitizeToolCallInputs in src/agents/session-transcript-repair.ts
  • Applied in sanitizeSessionHistory in src/agents/pi-embedded-runner/google.ts

Global rule: inter-session input provenance

When an agent sends a prompt into another session via sessions_send (including agent-to-agent reply/announce steps), OpenClaw persists the created user turn with:

  • message.provenance.kind = "inter_session"

This metadata is written at transcript append time and does not change role (role: "user" remains for provider compatibility). Transcript readers can use this to avoid treating routed internal prompts as end-user-authored instructions.

During context rebuild, OpenClaw also prepends a short [Inter-session message] marker to those user turns in-memory so the model can distinguish them from external end-user instructions.


Provider matrix (current behavior)

OpenAI / OpenAI Codex

  • Image sanitization only.
  • Drop orphaned reasoning signatures (standalone reasoning items without a following content block) for OpenAI Responses/Codex transcripts.
  • No tool call id sanitization.
  • No tool result pairing repair.
  • No turn validation or reordering.
  • No synthetic tool results.
  • No thought signature stripping.

Google (Generative AI / Gemini CLI / Antigravity)

  • Tool call id sanitization: strict alphanumeric.
  • Tool result pairing repair and synthetic tool results.
  • Turn validation (Gemini-style turn alternation).
  • Google turn ordering fixup (prepend a tiny user bootstrap if history starts with assistant).
  • Antigravity Claude: normalize thinking signatures; drop unsigned thinking blocks.

Anthropic / Minimax (Anthropic-compatible)

  • Tool result pairing repair and synthetic tool results.
  • Turn validation (merge consecutive user turns to satisfy strict alternation).

Mistral (including model-id based detection)

  • Tool call id sanitization: strict9 (alphanumeric length 9).

OpenRouter Gemini

  • Thought signature cleanup: strip non-base64 thought_signature values (keep base64).

Everything else

  • Image sanitization only.

Historical behavior (pre-2026.1.22)

Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene:

  • A transcript-sanitize extension ran on every context build and could:
    • Repair tool use/result pairing.
    • Sanitize tool call ids (including a non-strict mode that preserved _/-).
  • The runner also performed provider-specific sanitization, which duplicated work.
  • Additional mutations occurred outside the provider policy, including:
    • Stripping <final> tags from assistant text before persistence.
    • Dropping empty assistant error turns.
    • Trimming assistant content after tool calls.

This complexity caused cross-provider regressions (notably openai-responses call_id|fc_id pairing). The 2026.1.22 cleanup removed the extension, centralized logic in the runner, and made OpenAI no-touch beyond image sanitization.