Response Caching

AI Controller includes a sophisticated response caching system that can dramatically reduce costs, improve performance, and ensure consistency in LLM responses.

Caching System Overview

The AI Controller caching system stores responses from LLM providers and serves them for identical requests, eliminating the need to repeatedly send the same queries to external APIs. This approach provides multiple benefits while maintaining full control over cache behavior. To understand how caching fits into the request processing flow, see Data Flow.

The Caching Process

When processing a request, AI Controller follows this sequence:

Cache Key Generation: Uses the complete Request payload as the cache key
Cache Lookup: Checks if a valid response exists for this hash
Cache Hit: If found, returns the cached response immediately
Cache Miss: If not found, forwards the request to the LLM provider
Cache Storage: Stores new responses in the cache for future use
Cache Persistence: Cached entries are stored indefinitely

flowchart TD
    A[Client Request] --> B[Generate Cache Key]
    B --> C{Cache Lookup}
    C -->|Cache Hit| D[Retrieve Cached Response]
    C -->|Cache Miss| E[Forward Request to LLM Provider]
    E --> F[Receive LLM Response]
    F --> G[Store Response in Cache]
    G --> H[Return Response to Client]
    D --> H

    subgraph "Cache Management"
    I[Permanent Storage] -.-> J[No Automatic Eviction]
    end

Diagram showing the flow of a request through the AI Controller caching system, with decision points for cache hit/miss. For more on the caching system's role in AI Controller's architecture, see Architecture Overview.

Key Benefits of Response Caching

Cost Optimization

By reducing the number of API calls to LLM providers, caching can significantly lower your operational costs:

Provider	Average Cost per 1K Tokens	Potential Savings with 50% Cache Hit Rate
OpenAI GPT-4	$0.03 input / $0.06 output	$45 per 1M tokens
Anthropic Claude	$0.025 input / $0.08 output	$52.50 per 1M tokens
Mistral Large	$0.007 input / $0.021 output	$14 per 1M tokens

For high-volume applications, these savings can amount to thousands of dollars per month.

Performance Improvement

Cached responses are served directly from memory, drastically reducing response times:

Source	Typical Response Time
Direct LLM API Call	500ms - 5000ms
AI Controller Cache Hit	10ms - 50ms

This performance boost creates a more responsive user experience and higher throughput for your applications.

Response Consistency

LLMs can produce varying responses to identical prompts. Caching ensures consistent answers:

Critical for FAQ systems and knowledge bases
Essential for compliance and auditing
Helpful for development and testing
Reduces the impact of "hallucinations" in production

Reliability Enhancement

Caching improves system resilience:

Continues functioning during provider outages
Handles rate limit issues gracefully
Reduces impact of network problems
Provides stability during traffic spikes

Performance Metrics

For detailed benchmarks and performance improvement data with AI Controller caching, please refer to the Performance Optimization documentation.

Performance Optimization
Cost Management
Rules Engine
Data Flow - Understand how caching integrates into the request processing pipeline
Architecture Overview - Learn about the caching system's role in AI Controller's architecture

Updated: 2025-05-15