Cost Management
AI Controller provides cost management capabilities to help organizations control, monitor, and optimize their LLM expenditures.
Cost Management Overview
LLM services typically charge based on token usage, with different rates for different models and providers. AI Controller helps you manage these costs through:
- Usage Tracking: Detailed monitoring of requests
- Reporting: Basic usage metrics including number of requests, providers, models, and request lengths
Understanding LLM Costs
Token-Based Pricing
Most LLM providers use token-based pricing models. For a deeper understanding of different models and their pricing structures, see Models and Providers.
Provider | Model | Input Price (per 1K tokens) | Output Price (per 1K tokens) |
---|---|---|---|
OpenAI | GPT-4 | $0.03 | $0.06 |
OpenAI | GPT-3.5-Turbo | $0.0015 | $0.002 |
Anthropic | Claude-3-Opus | $0.015 | $0.075 |
Anthropic | Claude-3-Sonnet | $0.003 | $0.015 |
Gemini Pro | $0.00025 | $0.0005 |
Note: Prices are subject to change; always verify current pricing with providers.
Cost Factors
Several factors affect your overall LLM costs:
- Model selection: More capable models cost more
- Request volume: Higher usage means higher costs
- Prompt length: Longer prompts consume more input tokens
- Response length: Longer responses consume more output tokens
- Caching efficiency: Higher cache hit rates reduce costs
- Provider selection: Different providers have different pricing
Usage Logs and Cache Entries
AI Controller tracks usage through two related systems:
-
Log Entries:
- Include metadata such as timestamp, provider used, and user
- Each log entry has a unique CorrelationId
-
Cache Entries:
- Include the Request (input) and Response (output)
- Model information is indirectly available within the Request
- Can be linked to Log entries via the CorrelationId
Cost Optimization Strategies
AI Controller offers several features to reduce unnecessary spending. These strategies form part of AI Controller's overall cost governance framework.
-
Response Caching: Implement caching to avoid redundant API calls. At a 50% cache hit rate, most organizations can reduce costs by half. For detailed configuration, see Response Caching.
-
Model Access Control: Control which models can be used by different user groups:
- Configure default models based on cost/quality requirements
- Use the Rules Engine to restrict access to expensive models
- Monitor model usage patterns
- Consider which use cases require more expensive models
Related Documentation
- Caching System
- Rules Engine
- Logging and Monitoring
- API Key Management
- Performance Optimization
- Governance - Learn about AI Controller's cost governance framework
- Models and Providers - Understand cost differences between models
Updated: 2025-05-15