Skip to content

Cost Management

AI Controller provides cost management capabilities to help organizations control, monitor, and optimize their LLM expenditures.

Cost Management Overview

LLM services typically charge based on token usage, with different rates for different models and providers. AI Controller helps you manage these costs through:

  • Usage Tracking: Detailed monitoring of requests
  • Reporting: Basic usage metrics including number of requests, providers, models, and request lengths

Understanding LLM Costs

Token-Based Pricing

Most LLM providers use token-based pricing models. For a deeper understanding of different models and their pricing structures, see Models and Providers.

Provider Model Input Price (per 1K tokens) Output Price (per 1K tokens)
OpenAI GPT-4 $0.03 $0.06
OpenAI GPT-3.5-Turbo $0.0015 $0.002
Anthropic Claude-3-Opus $0.015 $0.075
Anthropic Claude-3-Sonnet $0.003 $0.015
Google Gemini Pro $0.00025 $0.0005

Note: Prices are subject to change; always verify current pricing with providers.

Cost Factors

Several factors affect your overall LLM costs:

  • Model selection: More capable models cost more
  • Request volume: Higher usage means higher costs
  • Prompt length: Longer prompts consume more input tokens
  • Response length: Longer responses consume more output tokens
  • Caching efficiency: Higher cache hit rates reduce costs
  • Provider selection: Different providers have different pricing

Usage Logs and Cache Entries

AI Controller tracks usage through two related systems:

  1. Log Entries:

    • Include metadata such as timestamp, provider used, and user
    • Each log entry has a unique CorrelationId
  2. Cache Entries:

    • Include the Request (input) and Response (output)
    • Model information is indirectly available within the Request
    • Can be linked to Log entries via the CorrelationId

Cost Optimization Strategies

AI Controller offers several features to reduce unnecessary spending. These strategies form part of AI Controller's overall cost governance framework.

  • Response Caching: Implement caching to avoid redundant API calls. At a 50% cache hit rate, most organizations can reduce costs by half. For detailed configuration, see Response Caching.

  • Model Access Control: Control which models can be used by different user groups:

    1. Configure default models based on cost/quality requirements
    2. Use the Rules Engine to restrict access to expensive models
    3. Monitor model usage patterns
    4. Consider which use cases require more expensive models

Updated: 2025-05-15