Documentation

AgentCache is a Virtual Context Memory layer for AI agents. It sits between your agent and the LLM provider, automatically caching responses and managing long-term memory.

Why use AgentCache?

Zero Latency

Cache hits return in < 10ms, making your agents feel instant.

Reduce Costs

Save up to 80% on API bills by never paying for the same prompt twice.

Quick Start

AgentCache is drop-in compatible with the OpenAI SDK. You just need to change the baseURL.

Python
import openai

client = openai.OpenAI(
    base_url="https://agentcache.ai/api/v1",
    api_key="ac_live_..."  # Your AgentCache key
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "X-OpenAI-Key": "sk-..." # Your real OpenAI key
    }
)

Smart Routing

AgentCache isn't just about caching; it's about intelligent optimization. With our Smart Routing feature (powered by RouteLLM), you can automatically route queries to the most cost-effective model that can handle the task.

How it works

  • Cache Hit: Served instantly from L1/L2 cache (0 cost, 0ms latency).
  • Cache Miss: The router analyzes the prompt complexity.
  • Simple Query: Routed to a cheaper model (e.g., GPT-3.5, Haiku).
  • Complex Query: Routed to a strong model (e.g., GPT-4o, Sonnet 3.5).

Usage

Simply set the model parameter to route-llm and provide your Abacus API key.

curl https://agentcache.ai/api/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ac_live_..." \
    -H "X-Abacus-Key: your_abacus_api_key" \
    -d '{
      "model": "route-llm",
      "messages": [
        {"role": "user", "content": "Explain quantum computing"}
      ]
    }'

API Reference

Full API reference coming soon. For now, use the OpenAI-compatible endpoint documented above.