GenAIWiki
intermediate

Prompt Caching for Agent Workflows

How to use prompt caching to reduce latency and cost when agents reuse long system prompts, tools, documents, and examples.
prompt cachingagentslatencycost optimizationlong context

8 min read

Updated todayVerified recentlyInformation score 92

Key insights

Concrete technical or product signals.

  • Prompt caching is strongest for repeated long prefixes, not one-off short prompts.
  • Stable tool definitions and policy context are good cache candidates.
  • Cache hit rate should be tracked alongside latency and token cost.

Use cases

Where this shines in production.

  • Reducing cost for agents that reuse large policy or codebase context.
  • Speeding up repeated analysis over a fixed source bundle.
  • Optimizing long multi-turn conversations with stable instructions.

Limitations & trade-offs

What to watch for.

  • Cache duration and cache-key behavior are provider-specific.
  • Changing the prefix can turn an expected cache hit into a miss.

Prompt Caching for Agent Workflows

Prompt caching helps when many requests share the same long prefix. In an agent, that prefix often includes the system prompt, policy instructions, tool definitions, examples, project context, or reference documents.

Put stable context first

The cache works best when stable content appears before the changing request. Put tool definitions, policies, reusable examples, and large reference documents ahead of the user-specific question when your provider's prompt format supports that pattern.

Keep cacheable blocks stable

Small changes can reduce cache hits. Avoid reordering documents, regenerating tool descriptions, or injecting timestamps into the cacheable prefix unless those changes are necessary.

Measure first request versus steady state

The first request may still pay the normal processing cost. The benefit appears when later requests reuse the same prefix before the cache expires.

Agent-specific uses

Cache a coding agent's repository summary, a support agent's policy manual, a research agent's source bundle, or a workflow agent's stable tool definitions. Recompute the cache when the underlying material changes.

Sources