Prompt Caching for Agent Workflows

Prompt caching helps when many requests share the same long prefix. In an agent, that prefix often includes the system prompt, policy instructions, tool definitions, examples, project context, or reference documents.

Put stable context first

The cache works best when stable content appears before the changing request. Put tool definitions, policies, reusable examples, and large reference documents ahead of the user-specific question when your provider's prompt format supports that pattern.

Keep cacheable blocks stable

Small changes can reduce cache hits. Avoid reordering documents, regenerating tool descriptions, or injecting timestamps into the cacheable prefix unless those changes are necessary.

Measure first request versus steady state

The first request may still pay the normal processing cost. The benefit appears when later requests reuse the same prefix before the cache expires.

Agent-specific uses

Cache a coding agent's repository summary, a support agent's policy manual, a research agent's source bundle, or a workflow agent's stable tool definitions. Recompute the cache when the underlying material changes.

Sources

Anthropic prompt caching documentation: https://platform.claude.com/docs/en/build-with-claude/prompt-caching

Prompt Caching for Agent Workflows

Key insights

Use cases

Limitations & trade-offs

Prompt Caching for Agent Workflows

Put stable context first

Keep cacheable blocks stable

Measure first request versus steady state

Agent-specific uses

Sources