Open Source · Apache 2.0

Give your AI a memory

Context Quilt is a persistent cognitive memory layer that sits between your app and its LLM. It learns from every conversation and injects relevant context into the next one — with zero latency impact.

View on GitHub Interactive API Docs
<10ms
Hot path latency
3
Memory tiers
0
Blocked user requests
Any
LLM provider
AI has a goldfish memory
Every LLM call starts from zero. Your users repeat themselves. Your AI forgets everything. Context is scattered across apps and lost between sessions.
User
What did we decide about the transcription vendor?
AI (stateless)
I don't have any information about previous meetings or decisions. Could you provide more details about which transcription vendor you're referring to?
Every conversation is a first date.
  • 🔄

    The Goldfish Problem

    LLMs are stateless by design. Every request starts with zero memory of the last one. Users re-explain the same context over and over.

  • 🧩

    Memory Fragmentation

    Users interact with AI across Slack, email, meetings, coding tools. Each platform's AI forgets everything when the session ends. No unified picture exists.

  • 🚫

    Proxying Kills Performance

    Most memory solutions force all traffic through their servers, adding latency and creating a single point of failure on your critical path.

Two paths. Zero compromise.
Context Quilt separates reading and writing completely. The hot path never waits for the cold path. Your users never wait for anything.
Hot Path · Synchronous
< 10ms

Read: Get Context

When your app needs context for a new LLM call, it asks Context Quilt. Pre-computed context lives in Redis. Your app injects it into its own prompt.

  • Pre-computed in Redis (<1ms lookup)
  • No LLM calls on the read path, ever
  • App stays in control of its own LLM calls
  • Cache miss falls back to Postgres (<50ms)
Cold Path · Asynchronous
Background

Write: Learn & Extract

After the user gets their response, your app sends the conversation to CQ. The extraction pipeline runs in the background — the user never waits.

  • Extracts facts, entities, relationships
  • Builds user profile and communication style
  • Connects knowledge into a graph
  • Rebuilds Redis cache for next read

Context Quilt is not a proxy. Your app sends LLM requests directly to the provider. CQ enhances your app with memory — it doesn't sit in the critical path.

Knowledge that's connected, not just collected
Like a physical quilt, the value isn't in individual pieces — it's in how they're stitched together.
Knowledge Graph
Sarah Chen
Widget 2.0
Direct, concise
Use Nova 3
Acme Corp
Deliver samples
David (CTO)
Q2 Roadmap
🧩

Patches — Factual Memory

Individual pieces of knowledge: who someone is, what they prefer, decisions they've made, commitments they've taken on. Stored in PostgreSQL, editable by the user.

Identity · Preferences · Traits · Experiences
🔧

Stitching — Episodic Memory

The graph layer that connects patches. "Sarah works on Widget 2.0. Widget 2.0 has a deadline from Acme Corp. Acme's CTO is David." Follow any thread to surface connected context.

Entities · Relationships · Graph Traversal

Working Memory — Redis Cache

Pre-computed context blocks ready to inject. The hot path only reads from here — no computation, no LLM calls, just a cache lookup. Rebuilt automatically when new knowledge arrives.

<1ms reads · 1-hour TTL · Auto-rebuilds
Three API calls. That's it.
Your app stays in control. CQ adds memory without changing how you talk to your LLM.
1

Send events as they happen

After your user gets their response, fire-and-forget the conversation to CQ. It returns immediately (202) and processes in the background.

2

CQ extracts & connects

The extraction pipeline picks out facts, entities, relationships, and communication patterns. It organizes them into the user's quilt and rebuilds the cache.

3

Recall context before LLM calls

Before your next LLM call, ask CQ for relevant context. Inject it into your prompt. Your LLM now knows the user's history, preferences, and active projects.

Memory for every kind of AI app
Context Quilt isn't tied to one platform. Any app that talks to an LLM can benefit from persistent memory.
🎙

Meeting Copilots

Your AI remembers past decisions, action items, and who committed to what. Next meeting picks up where the last one left off.

"We decided to go with Nova 3 for transcription last Tuesday. Travis committed to uploading the sample files by Thursday."
💻

Coding Assistants

Remember the user's codebase architecture, preferred patterns, active branches, and recent debugging sessions across IDE restarts.

"You prefer functional React patterns and you've been refactoring the auth module to use JWT since last week."
🏢

Customer Support

Agents know the customer's history, open tickets, product tier, and past interactions without searching through systems.

"This is an enterprise customer on the Pro plan. They reported the same sync issue 3 weeks ago — it was resolved by clearing the cache."
📈

Sales & CRM Tools

Track deal context, stakeholder relationships, competitive intel, and next steps across every interaction with a prospect.

"Acme Corp's CTO David prefers technical deep-dives. Their evaluation deadline is end of Q2. They're also talking to Competitor X."
From stateless to stateful AI

Without Context Quilt

"What did we decide last week?" gets a blank stare from your AI

Users re-explain their role, project, and preferences every session

Context scattered across platforms with no unified picture

Generic, one-size-fits-all responses regardless of who's asking

Wasted tokens re-establishing context on every call

With Context Quilt

"You decided to go with Nova 3 for transcription last Tuesday"

AI already knows who the user is and adapts to their communication style

Knowledge graph connects facts across apps into a unified picture

Responses tailored to the user's style, history, and active projects

30-50% cost reduction from intelligent context injection

Add memory in minutes, not months
Register your app, send events, recall context. Full Swagger docs coming soon.

Quick Start

1
Deploy the Stack
docker compose up -d
2
Register Your App
POST /v1/auth/register
3
Start Sending Events
POST /v1/memory

API Reference

POST /v1/auth/register Register a new application

No authentication required. Returns a client_secret (shown only once).

# Request
{ "app_name": "my-coding-assistant" }

# Response 200
{
  "app_id": "uuid",
  "app_name": "my-coding-assistant",
  "client_secret": "sk-...",
  "created_at": "2025-01-15T..."
}
POST /v1/auth/token Get JWT access token

Exchange app_id + client_secret for a JWT. Token expires in 60 minutes.

# Request (form-encoded)
username={app_id}&password={client_secret}

# Response 200
{
  "access_token": "eyJ...",
  "token_type": "bearer",
  "expires_in": 3600
}
GET /v1/auth/apps List registered applications

Returns all registered apps with their auth enforcement settings.

PATCH /v1/auth/apps/{app_id} Update app settings

Toggle enforce_auth on/off for an application.

POST /v1/memory Queue content for memory processing

The write path entry point. Accepts conversations, summaries, queries, sentiment, and tool calls. Returns immediately — processing happens asynchronously.

# Request — Bearer or X-App-ID auth
{
  "user_id": "user-123",
  "interaction_type": "chat_log",
  "content": "User discussed project timeline...",
  "response": "AI suggested Q2 deadline...",
  "metadata": {
    "meeting_id": "mtg-456",
    "project": "Widget 2.0"
  }
}

# Response 200
{
  "status": "queued",
  "message": "Memory update received for async processing"
}
Interaction Types:
summary query sentiment tool_call trace chat_log meeting_summary
POST /v1/recall Get relevant context for a query

The hot path. Matches entities in the user's text against their knowledge graph and returns a pre-formatted context block. Target: <10ms. No LLM call involved.

# Request
{
  "user_id": "user-123",
  "text": "What's the status of Widget 2.0?",
  "max_hops": 2
}

# Response 200
{
  "context": "Sarah is a PM at Acme Corp.
  Prefers direct answers. Working on Widget 2.0.
  Decided on Nova 3 for transcription.
  Samples due Thursday. David (CTO) is sponsor.",
  "matched_entities": ["Widget 2.0", "Sarah"],
  "patch_count": 7
}
GET /v1/quilt/{user_id} Get user's complete quilt

Returns all facts, action items, and patch connections for a user. Supports filtering by category and incremental sync via since timestamp.

# Response 200
{
  "user_id": "user-123",
  "facts": [
    {
      "patch_id": "uuid",
      "fact": "Product Manager at Acme",
      "category": "identity",
      "patch_type": "identity",
      "source": "inferred",
      "connections": [
        {
          "to_patch_id": "uuid",
          "role": "works_on",
          "label": "Widget 2.0"
        }
      ]
    }
  ],
  "action_items": [...],
  "server_time": "2025-01-15T..."
}
GET /v1/quilt/{user_id}/graph Visual knowledge graph

Generates a force-directed graph visualization of the user's quilt. Returns SVG, PNG, or interactive HTML. Color-coded by patch type with edge coloring by relationship role.

PATCH /v1/quilt/{user_id}/patches/{patch_id} Update a patch

Let users correct extracted facts. Accepts fact and category fields. Marks the patch as user-declared.

DELETE /v1/quilt/{user_id}/patches/{patch_id} Delete a patch

Permanently remove a single patch from the user's quilt.

DELETE /v1/quilt/{user_id} Delete all user data (GDPR)

Complete data deletion. Removes all patches, entities, relationships, and cached data for a user.

POST /v1/quilt/{user_id}/rename-speaker Rename an entity

Rename a speaker or entity across all patches, relationships, and the entity index. Useful when the system labels someone as "Speaker 4" and the user knows their real name.

GET /v1/projects/{user_id} List user's projects

Returns all projects for a user with status and patch counts.

POST /v1/projects/{user_id} Create a project

Create a named project. Patches and entities can be scoped to projects for organized context retrieval.

PATCH /v1/projects/{user_id}/{project_id} Update or archive a project

Rename a project or change its status to archived. Archiving cascades to child patches.

POST /v1/enrich Template-based context injection

Pass a prompt template with [[placeholder]] syntax. CQ substitutes values from the user's profile. Supports defaults via [[key|fallback]].

# Request
{
  "user_id": "user-123",
  "template": "You are helping [[name|a user]],
  a [[role]]. They prefer [[communication_style]].
  Active project: [[current_project|none]]."
}

# Response 200
{
  "enriched_prompt": "You are helping Sarah,
  a Product Manager. They prefer concise,
  direct answers. Active project: Widget 2.0.",
  "used_variables": ["name", "role", "communication_style", "current_project"],
  "missing_variables": []
}
GET /v1/profile/{user_id} Get user's hydrated profile

Returns the cached user profile from Redis. Supports ?keys=key1,key2 to filter specific fields. Useful for building personalized UIs.

POST /v1/prewarm Warm user cache

Hydrate a user's profile and entity index from PostgreSQL into Redis. Call at session start to ensure the first /v1/recall is a cache hit. Completes in <50ms.

GET /health Health check

Returns service status and version. No authentication required.

Admin Dashboard API

A full admin API is available under /api/dashboard/ (requires X-Admin-Key header). Includes stats, user management, patch history, extraction metrics, cost tracking, pipeline testing, prompt management, and system health monitoring.

GET /stats GET /users GET /patches/recent GET /metrics/cost POST /test-pipeline GET /health-check GET /config
📖

Full Swagger / OpenAPI Docs

Interactive API documentation with request/response schemas, try-it-out capability, and full endpoint reference. Explore every endpoint live.

Open Swagger UI View Source & Docs on GitHub

Any LLM Provider

OpenRouter, OpenAI, Anthropic, Google Gemini, Ollama, vLLM, LiteLLM — anything OpenAI-compatible. Default extraction: Mistral Small 3.1 at $0.00009/call.

Smart Queue Consolidation

Events group by meeting_id and consolidate before extraction. Time-based (60 min) or context-budget (80%) triggers. One LLM call per batch.

Four Cognitive Roles

Extraction pipeline: Picker (facts & entities), Stitcher (organization), Designer (communication profile), Cataloger (summary). Single-call or multi-role mode.

Flexible Metadata

Send arbitrary key-value metadata with events. Group by meeting_id, ticket_id, repo, deal_id — whatever your app needs.

App-Scoped ACLs

Apps can only read/write patches they created. Built-in access control ensures multi-app environments stay isolated.

Deploy Anywhere

Docker Compose for the full stack (API + PostgreSQL + Redis + optional pgAdmin). Self-host on your infra. GPU-optional.

Not another memory wrapper
Most AI memory tools proxy your traffic, add latency, or lock you in. Context Quilt was designed differently from the ground up.

Non-Proxy Architecture

Your app talks to your LLM directly. CQ is a side-channel for context, not a gateway. No single point of failure on your critical path.

🕸

Graph Memory

Facts are connected into a knowledge graph, not dumped in a vector store. Traverse relationships to surface context that keyword search would miss.

🧬

Communication Profiling

CQ learns how users communicate — formality, directness, technical depth, tone. Your AI adapts its personality, not just its knowledge.

🔓

Open Source Core

Apache 2.0 licensed. Self-host the full stack. No cloud dependency, no usage-based surprises. Your data stays on your infrastructure.

🔄

Provider Neutral

Works with any OpenAI-compatible API. Switch LLM providers without changing your memory layer. No vendor lock-in.

🛡

User Control Built In

Users can see, edit, and delete everything CQ knows about them. Transparency isn't an afterthought — it's a core API endpoint.

Memory that respects its owner
We believe your users' memory should belong to them.
👁

Fully Transparent

Everything the AI remembers is visible in the user's quilt. No hidden data, no black boxes.

User-Controlled

Full CRUD on all patches. Users edit incorrect facts, delete anything they want. It's their data.

🔒

Private by Design

Quilts are isolated per user. Never shared, never used for training. Self-host for complete control.

Auto-Archiving

Old facts fade naturally. Completed tasks and stale context auto-archive after configurable TTLs. The quilt stays clean.

Ready to give your AI a memory?

Context Quilt is open source and ready to deploy. Add persistent, graph-connected memory to your AI application today.

View on GitHub Interactive API Docs