Documentation
AgentCache is a Virtual Context Memory layer for AI agents. It sits between your agent and the LLM provider, automatically caching responses and managing long-term memory.
Why use AgentCache?
Zero Latency
Cache hits return in < 10ms, making your agents feel instant.
Reduce Costs
Save up to 80% on API bills by never paying for the same prompt twice.
Quick Start
AgentCache is drop-in compatible with the OpenAI SDK. You just need to change the
baseURL.
import openai
client = openai.OpenAI(
base_url="https://agentcache.ai/api/v1",
api_key="ac_live_..." # Your AgentCache key
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"X-OpenAI-Key": "sk-..." # Your real OpenAI key
}
)
Smart Routing
AgentCache isn't just about caching; it's about intelligent optimization. With our Smart Routing feature (powered by RouteLLM), you can automatically route queries to the most cost-effective model that can handle the task.
How it works
- Cache Hit: Served instantly from L1/L2 cache (0 cost, 0ms latency).
- Cache Miss: The router analyzes the prompt complexity.
- Simple Query: Routed to a cheaper model (e.g., GPT-3.5, Haiku).
- Complex Query: Routed to a strong model (e.g., GPT-4o, Sonnet 3.5).
Usage
Simply set the model parameter to route-llm and provide your Abacus API
key.
curl https://agentcache.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ac_live_..." \
-H "X-Abacus-Key: your_abacus_api_key" \
-d '{
"model": "route-llm",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}'
API Reference
Full API reference coming soon. For now, use the OpenAI-compatible endpoint documented above.