OSS / Policy Controls

Cache

Browse docs

--- title: "Cache" description: "How Aurora response caching works, how to enable exact and semantic cache layers, and what is included in the exact-cache key." icon: "database" ---

Overview

Aurora ships with two response-cache layers for non-streaming requests on:

/v1/chat/completions
/v1/responses
/v1/embeddings

Exact-match cache returns byte-identical responses with:

http

X-Cache: HIT (exact)

Semantic cache uses embeddings plus vector search so meaning-equivalent prompts can reuse a stored response:

http

X-Cache: HIT (semantic)

Enable the exact cache

Point response caching at Redis:

yaml

cache:
  response:
    simple:
      redis:
        url: redis://localhost:6379
        ttl: 3600

You can also configure it with environment variables:

RESPONSE_CACHE_SIMPLE_ENABLED=true
REDIS_URL
REDIS_KEY_RESPONSES
REDIS_TTL_RESPONSES

RESPONSE_CACHE_SIMPLE_ENABLED=true is the opt-in switch for env-only deployments when config.yaml does not contain a cache.response.simple block.

Enable semantic caching

Add a semantic block with an embedder provider and a vector store:

yaml

cache:
  response:
    semantic:
      enabled: true
      embedder:
        provider: openai
      vector_store:
        type: qdrant
        qdrant:
          url: http://localhost:6333
          collection: aurora_semantic

Supported vector stores:

qdrant
pgvector
pinecone
weaviate

Env-only semantic cache deployments use the SEMANTIC_CACHE_* variables:

env

SEMANTIC_CACHE_ENABLED=true
SEMANTIC_CACHE_EMBEDDER_PROVIDER=openai
SEMANTIC_CACHE_EMBEDDER_MODEL=text-embedding-3-small
SEMANTIC_CACHE_VECTOR_STORE_TYPE=qdrant
SEMANTIC_CACHE_QDRANT_URL=http://localhost:6333
SEMANTIC_CACHE_QDRANT_COLLECTION=aurora_semantic

All semantic cache environment variables:

Variable	Description	Default
`SEMANTIC_CACHE_ENABLED`	Enable semantic cache	`false`
`SEMANTIC_CACHE_THRESHOLD`	Similarity threshold for vector matches (0.0â€“1.0)	_(code default)_
`SEMANTIC_CACHE_PROMPT_SIMILARITY`	Prompt similarity threshold (0.0â€“1.0)	_(code default)_
`SEMANTIC_CACHE_TTL`	Cache TTL in seconds	_(not set)_
`SEMANTIC_CACHE_MAX_CONV_MESSAGES`	Max conversation messages to consider	_(not set)_
`SEMANTIC_CACHE_EXCLUDE_SYSTEM_PROMPT`	Exclude system prompt from embedding	`false`
`SEMANTIC_CACHE_EMBEDDER_PROVIDER`	Embedder provider name	_(required)_
`SEMANTIC_CACHE_EMBEDDER_MODEL`	Embedder model name	_(required)_
`SEMANTIC_CACHE_VECTOR_STORE_TYPE`	Vector store: `qdrant`, `pgvector`, `pinecone`, `weaviate`	_(required)_
`SEMANTIC_CACHE_QDRANT_URL`	Qdrant server URL	â€”
`SEMANTIC_CACHE_QDRANT_COLLECTION`	Qdrant collection name	â€”
`SEMANTIC_CACHE_QDRANT_API_KEY`	Qdrant API key	â€”
`SEMANTIC_CACHE_PGVECTOR_URL`	PostgreSQL pgvector connection string	â€”
`SEMANTIC_CACHE_PGVECTOR_TABLE`	pgvector table name	â€”
`SEMANTIC_CACHE_PGVECTOR_DIMENSION`	pgvector vector dimension	â€”
`SEMANTIC_CACHE_PINECONE_HOST`	Pinecone host	â€”
`SEMANTIC_CACHE_PINECONE_API_KEY`	Pinecone API key	â€”
`SEMANTIC_CACHE_PINECONE_NAMESPACE`	Pinecone namespace	â€”
`SEMANTIC_CACHE_PINECONE_DIMENSION`	Pinecone vector dimension	â€”
`SEMANTIC_CACHE_WEAVIATE_URL`	Weaviate server URL	â€”
`SEMANTIC_CACHE_WEAVIATE_CLASS`	Weaviate class name	â€”
`SEMANTIC_CACHE_WEAVIATE_API_KEY`	Weaviate API key	â€”

Diagram

Both cache layers run after workflow and guardrail patching, so they operate on the final request sent upstream. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per request.

For the full semantic-cache design and storage options, see the gateway source configuration examples and .env.template.

What the exact cache keys on

The exact cache hashes:

the request path
the resolved workflow context used for execution

specifically execution mode, provider type, and resolved model

the final request body

This means guardrails and workflows affect cache keys when they change the resolved workflow or the final body sent through execution.

`user_path` behavior

For the exact cache, user_path is not added to the cache key by itself.

That is intentional. If two requests end up with the same path, resolved workflow, and final request body, they can share the same exact-cache entry even when they originate from different user_path values.

Common patterns:

disable cache in a scoped workflow
use different scoped workflows for different user_path values
include scope-specific context so the final request body differs

Cache analytics

When response caching and usage tracking are enabled, the admin API exposes a cached-only overview at:

text

/admin/api/v1/cache/overview

Cached usage entries are also visible in the regular usage log and summary endpoints.

Manual and API usage

Cache backends are boot configuration. Enable exact or semantic cache through YAML or environment variables before starting Aurora. Per-scope cache behavior is controlled by workflows, so operators can manually disable cache for a team, provider, or model in the dashboard at Workflows.

Server-side automation should use workflows for policy changes and cache debug for explaining a specific request:

For endpoint reference see the Admin API section.

Example debug call:

bash

curl -X POST http://your-aurora-host/admin/api/v1/cache/debug \
  -H "Authorization: Bearer $AURORA_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/v1/chat/completions",
    "body": {
      "model": "<model id returned by /v1/models>",
      "messages": [{"role": "user", "content": "Hello"}]
    }
  }'

← All docs

OSS / Policy Controls

Cache

Browse docs

--- title: "Cache" description: "How Aurora response caching works, how to enable exact and semantic cache layers, and what is included in the exact-cache key." icon: "database" ---

Overview

Aurora ships with two response-cache layers for non-streaming requests on:

/v1/chat/completions
/v1/responses
/v1/embeddings

Exact-match cache returns byte-identical responses with:

http

X-Cache: HIT (exact)

Semantic cache uses embeddings plus vector search so meaning-equivalent prompts can reuse a stored response:

http

X-Cache: HIT (semantic)

Enable the exact cache

Point response caching at Redis:

yaml

cache:
  response:
    simple:
      redis:
        url: redis://localhost:6379
        ttl: 3600

You can also configure it with environment variables:

RESPONSE_CACHE_SIMPLE_ENABLED=true
REDIS_URL
REDIS_KEY_RESPONSES
REDIS_TTL_RESPONSES

RESPONSE_CACHE_SIMPLE_ENABLED=true is the opt-in switch for env-only deployments when config.yaml does not contain a cache.response.simple block.

Enable semantic caching

Add a semantic block with an embedder provider and a vector store:

yaml

cache:
  response:
    semantic:
      enabled: true
      embedder:
        provider: openai
      vector_store:
        type: qdrant
        qdrant:
          url: http://localhost:6333
          collection: aurora_semantic

Supported vector stores:

qdrant
pgvector
pinecone
weaviate

Env-only semantic cache deployments use the SEMANTIC_CACHE_* variables:

env

SEMANTIC_CACHE_ENABLED=true
SEMANTIC_CACHE_EMBEDDER_PROVIDER=openai
SEMANTIC_CACHE_EMBEDDER_MODEL=text-embedding-3-small
SEMANTIC_CACHE_VECTOR_STORE_TYPE=qdrant
SEMANTIC_CACHE_QDRANT_URL=http://localhost:6333
SEMANTIC_CACHE_QDRANT_COLLECTION=aurora_semantic

All semantic cache environment variables:

Variable	Description	Default
`SEMANTIC_CACHE_ENABLED`	Enable semantic cache	`false`
`SEMANTIC_CACHE_THRESHOLD`	Similarity threshold for vector matches (0.0â€“1.0)	_(code default)_
`SEMANTIC_CACHE_PROMPT_SIMILARITY`	Prompt similarity threshold (0.0â€“1.0)	_(code default)_
`SEMANTIC_CACHE_TTL`	Cache TTL in seconds	_(not set)_
`SEMANTIC_CACHE_MAX_CONV_MESSAGES`	Max conversation messages to consider	_(not set)_
`SEMANTIC_CACHE_EXCLUDE_SYSTEM_PROMPT`	Exclude system prompt from embedding	`false`
`SEMANTIC_CACHE_EMBEDDER_PROVIDER`	Embedder provider name	_(required)_
`SEMANTIC_CACHE_EMBEDDER_MODEL`	Embedder model name	_(required)_
`SEMANTIC_CACHE_VECTOR_STORE_TYPE`	Vector store: `qdrant`, `pgvector`, `pinecone`, `weaviate`	_(required)_
`SEMANTIC_CACHE_QDRANT_URL`	Qdrant server URL	â€”
`SEMANTIC_CACHE_QDRANT_COLLECTION`	Qdrant collection name	â€”
`SEMANTIC_CACHE_QDRANT_API_KEY`	Qdrant API key	â€”
`SEMANTIC_CACHE_PGVECTOR_URL`	PostgreSQL pgvector connection string	â€”
`SEMANTIC_CACHE_PGVECTOR_TABLE`	pgvector table name	â€”
`SEMANTIC_CACHE_PGVECTOR_DIMENSION`	pgvector vector dimension	â€”
`SEMANTIC_CACHE_PINECONE_HOST`	Pinecone host	â€”
`SEMANTIC_CACHE_PINECONE_API_KEY`	Pinecone API key	â€”
`SEMANTIC_CACHE_PINECONE_NAMESPACE`	Pinecone namespace	â€”
`SEMANTIC_CACHE_PINECONE_DIMENSION`	Pinecone vector dimension	â€”
`SEMANTIC_CACHE_WEAVIATE_URL`	Weaviate server URL	â€”
`SEMANTIC_CACHE_WEAVIATE_CLASS`	Weaviate class name	â€”
`SEMANTIC_CACHE_WEAVIATE_API_KEY`	Weaviate API key	â€”

Diagram

For the full semantic-cache design and storage options, see the gateway source configuration examples and .env.template.

What the exact cache keys on

The exact cache hashes:

the request path
the resolved workflow context used for execution

specifically execution mode, provider type, and resolved model

the final request body

This means guardrails and workflows affect cache keys when they change the resolved workflow or the final body sent through execution.

`user_path` behavior

For the exact cache, user_path is not added to the cache key by itself.

Common patterns:

disable cache in a scoped workflow
use different scoped workflows for different user_path values
include scope-specific context so the final request body differs

Cache analytics

When response caching and usage tracking are enabled, the admin API exposes a cached-only overview at:

text

/admin/api/v1/cache/overview

Cached usage entries are also visible in the regular usage log and summary endpoints.

Manual and API usage

Server-side automation should use workflows for policy changes and cache debug for explaining a specific request:

For endpoint reference see the Admin API section.

Example debug call:

bash

curl -X POST http://your-aurora-host/admin/api/v1/cache/debug \
  -H "Authorization: Bearer $AURORA_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/v1/chat/completions",
    "body": {
      "model": "<model id returned by /v1/models>",
      "messages": [{"role": "user", "content": "Hello"}]
    }
  }'