Documentation
AgentCache is a Virtual Context Memory layer for AI agents. It sits between your agent and the LLM provider, automatically caching responses and managing long-term memory.
Why use AgentCache?
Zero Latency
Cache hits return in < 10ms, making your agents feel instant.
Reduce Costs
Save up to 80% on API bills by never paying for the same prompt twice.
Quick Start
AgentCache is drop-in compatible with the OpenAI SDK. You just need to change the
baseURL.
import openai
client = openai.OpenAI(
base_url="https://agentcache.ai/api/v1",
api_key="ac_live_..." # Your AgentCache key
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"X-OpenAI-Key": "sk-..." # Your real OpenAI key
}
)
Authentication
AgentCache uses a dual-key authentication system to ensure isolation and security.
- Agent Key (Bearer): Authenticates your agent with AgentCache.
- Provider Key (Header): Your real LLM key (OpenAI/Anthropic), passed as
X-OpenAI-KeyorX-Anthropic-Key. - Tenant ID: Use
X-Tenant-Idto isolate vector indices in production.
The Immune System
Beyond passive caching, AgentCache implements an active Immune System for your agents. It detects "Cognitive Rot" and adversarial injections in real-time.
Dynamically lowers drift thresholds (0.15 → 0.05) for flagged sessions to quarantine suspicious reasoning.
Bypasses the cache and forces re-validation if semantic intent drifts outside allowed parameters.
Smart Routing
AgentCache isn't just about caching; it's about intelligent optimization. With our Smart Routing feature (powered by RouteLLM), you can automatically route queries to the most cost-effective model that can handle the task.
How it works
- Cache Hit: Served instantly from L1/L2 cache (0 cost, 0ms latency).
- Cache Miss: The router analyzes the prompt complexity.
- Simple Query: Routed to a cheaper model (e.g., GPT-3.5, Haiku).
- Complex Query: Routed to a strong model (e.g., GPT-4o, Sonnet 3.5).
Implementation Example
Simply set the model parameter to route-llm and provide your Abacus API key.
curl https://agentcache.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ac_live_..." \
-H "X-Abacus-Key: your_abacus_api_key" \
-d '{
"model": "route-llm",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}'
Semantic Caching
AgentCache fingerprints request intent, not just raw text. This allows for:
Identical prompt history cached in Redis.
Similarity search via Cognitive Vector Service.
Cognitive Vector Service (CVS)
Highly optimized FAISS-based vector engine (C# standalone) for sub-millisecond similarity search with HNSW indices.
X-Tenant-Id: your_org_id
Resonance Circles
Lateral knowledge sharing across agents. Join a circle to automatically benefit from the memory of other authorized agents in the same sector.
Topological Healing
Agent memory isn't static. Topological Healing allows the system to actively repair "drifted" reasoning by re-embedding nodes or issuing corrective Antibody Pulses (re-prompting the agent with verified truth).
Healing Strategies
Pathology Sandbox
Stress-test your agent against Pathological Personas. The sandbox simulates "Logic Duels" where adversarial agents attempt to corrupt your agent's core axioms (SOUL.md).
Predictive Synapse
Sequence learning that pre-fetches the next 3 potential cache hits before the agent even submits the query.
API Reference
Core endpoints are fully OpenAI-compatible.
SDKs
agentcache-node
npm install agentcache-node
Full lifecycle support for Node.js, including L1 local caching.
agentcache-python
pip install agentcache
Official Python client for LangChain and LlamaIndex integration.