Prompt Caching for Agent Workflows
Prompt caching helps when many requests share the same long prefix. In an agent, that prefix often includes the system prompt, policy instructions, tool definitions, examples, project context, or reference documents.
Put stable context first
The cache works best when stable content appears before the changing request. Put tool definitions, policies, reusable examples, and large reference documents ahead of the user-specific question when your provider's prompt format supports that pattern.
Keep cacheable blocks stable
Small changes can reduce cache hits. Avoid reordering documents, regenerating tool descriptions, or injecting timestamps into the cacheable prefix unless those changes are necessary.
Measure first request versus steady state
The first request may still pay the normal processing cost. The benefit appears when later requests reuse the same prefix before the cache expires.
Agent-specific uses
Cache a coding agent's repository summary, a support agent's policy manual, a research agent's source bundle, or a workflow agent's stable tool definitions. Recompute the cache when the underlying material changes.
Sources
- Anthropic prompt caching documentation: https://platform.claude.com/docs/en/build-with-claude/prompt-caching