Browse docs
--- title: "Cache" description: "How Aurora response caching works, how to enable exact and semantic cache layers, and what is included in the exact-cache key." icon: "database" ---
Overview
Aurora ships with two response-cache layers for non-streaming requests on:
/v1/chat/completions/v1/responses/v1/embeddings
Exact-match cache returns byte-identical responses with:
X-Cache: HIT (exact)Semantic cache uses embeddings plus vector search so meaning-equivalent prompts can reuse a stored response:
X-Cache: HIT (semantic)Enable the exact cache
Point response caching at Redis:
cache:
response:
simple:
redis:
url: redis://localhost:6379
ttl: 3600You can also configure it with environment variables:
RESPONSE_CACHE_SIMPLE_ENABLED=trueREDIS_URLREDIS_KEY_RESPONSESREDIS_TTL_RESPONSES
RESPONSE_CACHE_SIMPLE_ENABLED=true is the opt-in switch for env-only deployments when config.yaml does not contain a cache.response.simple block.
Enable semantic caching
Add a semantic block with an embedder provider and a vector store:
cache:
response:
semantic:
enabled: true
embedder:
provider: openai
vector_store:
type: qdrant
qdrant:
url: http://localhost:6333
collection: aurora_semanticSupported vector stores:
qdrantpgvectorpineconeweaviate
Env-only semantic cache deployments use the SEMANTIC_CACHE_* variables:
SEMANTIC_CACHE_ENABLED=true
SEMANTIC_CACHE_EMBEDDER_PROVIDER=openai
SEMANTIC_CACHE_EMBEDDER_MODEL=text-embedding-3-small
SEMANTIC_CACHE_VECTOR_STORE_TYPE=qdrant
SEMANTIC_CACHE_QDRANT_URL=http://localhost:6333
SEMANTIC_CACHE_QDRANT_COLLECTION=aurora_semanticAll semantic cache environment variables:
Both cache layers run after workflow and guardrail patching, so they operate on the final request sent upstream. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per request.
For the full semantic-cache design and storage options, see the gateway source configuration examples and .env.template.
What the exact cache keys on
The exact cache hashes:
- the request path
- the resolved workflow context used for execution
specifically execution mode, provider type, and resolved model
- the final request body
This means guardrails and workflows affect cache keys when they change the resolved workflow or the final body sent through execution.
`user_path` behavior
For the exact cache, user_path is not added to the cache key by itself.
That is intentional. If two requests end up with the same path, resolved workflow, and final request body, they can share the same exact-cache entry even when they originate from different user_path values.
Common patterns:
- disable cache in a scoped workflow
- use different scoped workflows for different
user_pathvalues - include scope-specific context so the final request body differs
Cache analytics
When response caching and usage tracking are enabled, the admin API exposes a cached-only overview at:
/admin/api/v1/cache/overviewCached usage entries are also visible in the regular usage log and summary endpoints.
Manual and API usage
Cache backends are boot configuration. Enable exact or semantic cache through YAML or environment variables before starting Aurora. Per-scope cache behavior is controlled by workflows, so operators can manually disable cache for a team, provider, or model in the dashboard at Workflows.
Server-side automation should use workflows for policy changes and cache debug for explaining a specific request:
For endpoint reference see the Admin API section.
Example debug call:
curl -X POST http://your-aurora-host/admin/api/v1/cache/debug \
-H "Authorization: Bearer $AURORA_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"path": "/v1/chat/completions",
"body": {
"model": "<model id returned by /v1/models>",
"messages": [{"role": "user", "content": "Hello"}]
}
}'