GenAIWiki
Optimization

Prompt caching

Prompt caching reuses a previously processed prompt prefix so repeated requests with the same context can be faster and cheaper.

Expanded definition

Prompt caching is useful when many requests share the same long system prompt, tool definitions, examples, documents, or conversation prefix. The provider caches a prefix, then future calls can resume from that cached state rather than reprocessing the entire repeated context. It is especially valuable for agent workflows with stable tool definitions, long reference documents, or repeated task instructions. Teams still need to understand cache lifetime, invalidation, prefix ordering, privacy policy, and first-request latency.

Related terms

Explore adjacent ideas in the knowledge graph.

Related

Comparisons, tools, and models that connect to this idea.