Response Caching
AI Controller includes a sophisticated response caching system that can dramatically reduce costs, improve performance, and ensure consistency in LLM responses.
Caching System Overview
The AI Controller caching system stores responses from LLM providers and serves them for identical requests, eliminating the need to repeatedly send the same queries to external APIs. This approach provides multiple benefits while maintaining full control over cache behavior. To understand how caching fits into the request processing flow, see Data Flow.
The Caching Process
When processing a request, AI Controller follows this sequence:
- Cache Key Generation: Uses the complete Request payload as the cache key
- Cache Lookup: Checks if a valid response exists for this hash
- Cache Hit: If found, returns the cached response immediately
- Cache Miss: If not found, forwards the request to the LLM provider
- Cache Storage: Stores new responses in the cache for future use
- Cache Persistence: Cached entries are stored indefinitely
flowchart TD
A[Client Request] --> B[Generate Cache Key]
B --> C{Cache Lookup}
C -->|Cache Hit| D[Retrieve Cached Response]
C -->|Cache Miss| E[Forward Request to LLM Provider]
E --> F[Receive LLM Response]
F --> G[Store Response in Cache]
G --> H[Return Response to Client]
D --> H
subgraph "Cache Management"
I[Permanent Storage] -.-> J[No Automatic Eviction]
end
Diagram showing the flow of a request through the AI Controller caching system, with decision points for cache hit/miss. For more on the caching system's role in AI Controller's architecture, see Architecture Overview.
Key Benefits of Response Caching
Cost Optimization
By reducing the number of API calls to LLM providers, caching can significantly lower your operational costs:
Provider | Average Cost per 1K Tokens | Potential Savings with 50% Cache Hit Rate |
---|---|---|
OpenAI GPT-4 | $0.03 input / $0.06 output | $45 per 1M tokens |
Anthropic Claude | $0.025 input / $0.08 output | $52.50 per 1M tokens |
Mistral Large | $0.007 input / $0.021 output | $14 per 1M tokens |
For high-volume applications, these savings can amount to thousands of dollars per month.
Performance Improvement
Cached responses are served directly from memory, drastically reducing response times:
Source | Typical Response Time |
---|---|
Direct LLM API Call | 500ms - 5000ms |
AI Controller Cache Hit | 10ms - 50ms |
This performance boost creates a more responsive user experience and higher throughput for your applications.
Response Consistency
LLMs can produce varying responses to identical prompts. Caching ensures consistent answers:
- Critical for FAQ systems and knowledge bases
- Essential for compliance and auditing
- Helpful for development and testing
- Reduces the impact of "hallucinations" in production
Reliability Enhancement
Caching improves system resilience:
- Continues functioning during provider outages
- Handles rate limit issues gracefully
- Reduces impact of network problems
- Provides stability during traffic spikes
Performance Metrics
For detailed benchmarks and performance improvement data with AI Controller caching, please refer to the Performance Optimization documentation.
Related Documentation
- Performance Optimization
- Cost Management
- Rules Engine
- Data Flow - Understand how caching integrates into the request processing pipeline
- Architecture Overview - Learn about the caching system's role in AI Controller's architecture
Updated: 2025-05-15