Documentation

AgentCache is a Virtual Context Memory layer for AI agents. It sits between your agent and the LLM provider, automatically caching responses and managing long-term memory.

Why use AgentCache?

Zero Latency

Cache hits return in < 10ms, making your agents feel instant.

Reduce Costs

Save up to 80% on API bills by never paying for the same prompt twice.

Quick Start

AgentCache is drop-in compatible with the OpenAI SDK. You just need to change the baseURL.

Python

import openai

client = openai.OpenAI(
    base_url="https://agentcache.ai/api/v1",
    api_key="ac_live_..."  # Your AgentCache key
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "X-OpenAI-Key": "sk-..." # Your real OpenAI key
    }
)

Authentication

AgentCache uses a dual-key authentication system to ensure isolation and security.

Agent Key (Bearer): Authenticates your agent with AgentCache.
Provider Key (Header): Your real LLM key (OpenAI/Anthropic), passed as X-OpenAI-Key or X-Anthropic-Key.
Tenant ID: Use X-Tenant-Id to isolate vector indices in production.

The Immune System

Beyond passive caching, AgentCache implements an active Immune System for your agents. It detects "Cognitive Rot" and adversarial injections in real-time.

Antibody Hardening

Dynamically lowers drift thresholds (0.15 → 0.05) for flagged sessions to quarantine suspicious reasoning.

Semantic Guardrails

Bypasses the cache and forces re-validation if semantic intent drifts outside allowed parameters.

Smart Routing

AgentCache isn't just about caching; it's about intelligent optimization. With our Smart Routing feature (powered by RouteLLM), you can automatically route queries to the most cost-effective model that can handle the task.

How it works

Cache Hit: Served instantly from L1/L2 cache (0 cost, 0ms latency).
Cache Miss: The router analyzes the prompt complexity.
Simple Query: Routed to a cheaper model (e.g., GPT-3.5, Haiku).
Complex Query: Routed to a strong model (e.g., GPT-4o, Sonnet 3.5).

Implementation Example

Simply set the model parameter to route-llm and provide your Abacus API key.

curl https://agentcache.ai/api/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ac_live_..." \
    -H "X-Abacus-Key: your_abacus_api_key" \
    -d '{
      "model": "route-llm",
      "messages": [
        {"role": "user", "content": "Explain quantum computing"}
      ]
    }'

Semantic Caching

AgentCache fingerprints request intent, not just raw text. This allows for:

Exact Match

Lower Latency

Identical prompt history cached in Redis.

Semantic Match

Higher Efficiency

Similarity search via Cognitive Vector Service.

Cognitive Vector Service (CVS)

Highly optimized FAISS-based vector engine (C# standalone) for sub-millisecond similarity search with HNSW indices.

POST /Vectors/drift
X-Tenant-Id: your_org_id

Resonance Circles

Lateral knowledge sharing across agents. Join a circle to automatically benefit from the memory of other authorized agents in the same sector.

Topological Healing

Agent memory isn't static. Topological Healing allows the system to actively repair "drifted" reasoning by re-embedding nodes or issuing corrective Antibody Pulses (re-prompting the agent with verified truth).

Healing Strategies

DROP

Force a re-embedding of the memory node to realign it with current canonical truth.

PULSE

Inject a corrective "Antibody" context into the agent's prompt to break circular reasoning.

Pathology Sandbox

Stress-test your agent against Pathological Personas. The sandbox simulates "Logic Duels" where adversarial agents attempt to corrupt your agent's core axioms (SOUL.md).

Predictive Synapse

Sequence learning that pre-fetches the next 3 potential cache hits before the agent even submits the query.

API Reference

Core endpoints are fully OpenAI-compatible.

POST /api/v1/chat/completions (Cache Entry Point)

POST /api/v1/embeddings (Vector Entry Point)

POST /api/cache/resonance (Resonance Probe)

SDKs

agentcache-node

npm install agentcache-node

Full lifecycle support for Node.js, including L1 local caching.

agentcache-python

pip install agentcache

Official Python client for LangChain and LlamaIndex integration.