Skip to content

Response Caching

AI Controller includes a sophisticated response caching system that can dramatically reduce costs, improve performance, and ensure consistency in LLM responses.

Caching System Overview

The AI Controller caching system stores responses from LLM providers and serves them for identical requests, eliminating the need to repeatedly send the same queries to external APIs. This approach provides multiple benefits while maintaining full control over cache behavior. To understand how caching fits into the request processing flow, see Data Flow.

The Caching Process

When processing a request, AI Controller follows this sequence:

  1. Cache Key Generation: Uses the complete Request payload as the cache key
  2. Cache Lookup: Checks if a valid response exists for this hash
  3. Cache Hit: If found, returns the cached response immediately
  4. Cache Miss: If not found, forwards the request to the LLM provider
  5. Cache Storage: Stores new responses in the cache for future use
  6. Cache Persistence: Cached entries are stored indefinitely
flowchart TD
    A[Client Request] --> B[Generate Cache Key]
    B --> C{Cache Lookup}
    C -->|Cache Hit| D[Retrieve Cached Response]
    C -->|Cache Miss| E[Forward Request to LLM Provider]
    E --> F[Receive LLM Response]
    F --> G[Store Response in Cache]
    G --> H[Return Response to Client]
    D --> H

    subgraph "Cache Management"
    I[Permanent Storage] -.-> J[No Automatic Eviction]
    end

Diagram showing the flow of a request through the AI Controller caching system, with decision points for cache hit/miss. For more on the caching system's role in AI Controller's architecture, see Architecture Overview.

Key Benefits of Response Caching

Cost Optimization

By reducing the number of API calls to LLM providers, caching can significantly lower your operational costs:

Provider Average Cost per 1K Tokens Potential Savings with 50% Cache Hit Rate
OpenAI GPT-4 $0.03 input / $0.06 output $45 per 1M tokens
Anthropic Claude $0.025 input / $0.08 output $52.50 per 1M tokens
Mistral Large $0.007 input / $0.021 output $14 per 1M tokens

For high-volume applications, these savings can amount to thousands of dollars per month.

Performance Improvement

Cached responses are served directly from memory, drastically reducing response times:

Source Typical Response Time
Direct LLM API Call 500ms - 5000ms
AI Controller Cache Hit 10ms - 50ms

This performance boost creates a more responsive user experience and higher throughput for your applications.

Response Consistency

LLMs can produce varying responses to identical prompts. Caching ensures consistent answers:

  • Critical for FAQ systems and knowledge bases
  • Essential for compliance and auditing
  • Helpful for development and testing
  • Reduces the impact of "hallucinations" in production

Reliability Enhancement

Caching improves system resilience:

  • Continues functioning during provider outages
  • Handles rate limit issues gracefully
  • Reduces impact of network problems
  • Provides stability during traffic spikes

Performance Metrics

For detailed benchmarks and performance improvement data with AI Controller caching, please refer to the Performance Optimization documentation.


Updated: 2025-05-15