Context Quilt is a persistent cognitive memory layer that sits between your app and its LLM. It learns from every conversation and injects relevant context into the next one — with zero latency impact.
LLMs are stateless by design. Every request starts with zero memory of the last one. Users re-explain the same context over and over.
Users interact with AI across Slack, email, meetings, coding tools. Each platform's AI forgets everything when the session ends. No unified picture exists.
Most memory solutions force all traffic through their servers, adding latency and creating a single point of failure on your critical path.
When your app needs context for a new LLM call, it asks Context Quilt. Pre-computed context lives in Redis. Your app injects it into its own prompt.
After the user gets their response, your app sends the conversation to CQ. The extraction pipeline runs in the background — the user never waits.
Context Quilt is not a proxy. Your app sends LLM requests directly to the provider. CQ enhances your app with memory — it doesn't sit in the critical path.
Individual pieces of knowledge: who someone is, what they prefer, decisions they've made, commitments they've taken on. Stored in PostgreSQL, editable by the user.
The graph layer that connects patches. "Sarah works on Widget 2.0. Widget 2.0 has a deadline from Acme Corp. Acme's CTO is David." Follow any thread to surface connected context.
Pre-computed context blocks ready to inject. The hot path only reads from here — no computation, no LLM calls, just a cache lookup. Rebuilt automatically when new knowledge arrives.
After your user gets their response, fire-and-forget the conversation to CQ. It returns immediately (202) and processes in the background.
The extraction pipeline picks out facts, entities, relationships, and communication patterns. It organizes them into the user's quilt and rebuilds the cache.
Before your next LLM call, ask CQ for relevant context. Inject it into your prompt. Your LLM now knows the user's history, preferences, and active projects.
Your AI remembers past decisions, action items, and who committed to what. Next meeting picks up where the last one left off.
Remember the user's codebase architecture, preferred patterns, active branches, and recent debugging sessions across IDE restarts.
Agents know the customer's history, open tickets, product tier, and past interactions without searching through systems.
Track deal context, stakeholder relationships, competitive intel, and next steps across every interaction with a prospect.
"What did we decide last week?" gets a blank stare from your AI
Users re-explain their role, project, and preferences every session
Context scattered across platforms with no unified picture
Generic, one-size-fits-all responses regardless of who's asking
Wasted tokens re-establishing context on every call
"You decided to go with Nova 3 for transcription last Tuesday"
AI already knows who the user is and adapts to their communication style
Knowledge graph connects facts across apps into a unified picture
Responses tailored to the user's style, history, and active projects
30-50% cost reduction from intelligent context injection
docker compose up -d
POST /v1/auth/register
POST /v1/memory
/v1/auth/register
Register a new application
No authentication required. Returns a client_secret (shown only once).
# Request { "app_name": "my-coding-assistant" } # Response 200 { "app_id": "uuid", "app_name": "my-coding-assistant", "client_secret": "sk-...", "created_at": "2025-01-15T..." }
/v1/auth/token
Get JWT access token
Exchange app_id + client_secret for a JWT. Token expires in 60 minutes.
# Request (form-encoded) username={app_id}&password={client_secret} # Response 200 { "access_token": "eyJ...", "token_type": "bearer", "expires_in": 3600 }
/v1/auth/apps
List registered applications
Returns all registered apps with their auth enforcement settings.
/v1/auth/apps/{app_id}
Update app settings
Toggle enforce_auth on/off for an application.
/v1/memory
Queue content for memory processing
The write path entry point. Accepts conversations, summaries, queries, sentiment, and tool calls. Returns immediately — processing happens asynchronously.
# Request — Bearer or X-App-ID auth { "user_id": "user-123", "interaction_type": "chat_log", "content": "User discussed project timeline...", "response": "AI suggested Q2 deadline...", "metadata": { "meeting_id": "mtg-456", "project": "Widget 2.0" } } # Response 200 { "status": "queued", "message": "Memory update received for async processing" }
summary
query
sentiment
tool_call
trace
chat_log
meeting_summary
/v1/recall
Get relevant context for a query
The hot path. Matches entities in the user's text against their knowledge graph and returns a pre-formatted context block. Target: <10ms. No LLM call involved.
# Request { "user_id": "user-123", "text": "What's the status of Widget 2.0?", "max_hops": 2 } # Response 200 { "context": "Sarah is a PM at Acme Corp. Prefers direct answers. Working on Widget 2.0. Decided on Nova 3 for transcription. Samples due Thursday. David (CTO) is sponsor.", "matched_entities": ["Widget 2.0", "Sarah"], "patch_count": 7 }
/v1/quilt/{user_id}
Get user's complete quilt
Returns all facts, action items, and patch connections for a user. Supports filtering by category and incremental sync via since timestamp.
# Response 200 { "user_id": "user-123", "facts": [ { "patch_id": "uuid", "fact": "Product Manager at Acme", "category": "identity", "patch_type": "identity", "source": "inferred", "connections": [ { "to_patch_id": "uuid", "role": "works_on", "label": "Widget 2.0" } ] } ], "action_items": [...], "server_time": "2025-01-15T..." }
/v1/quilt/{user_id}/graph
Visual knowledge graph
Generates a force-directed graph visualization of the user's quilt. Returns SVG, PNG, or interactive HTML. Color-coded by patch type with edge coloring by relationship role.
/v1/quilt/{user_id}/patches/{patch_id}
Update a patch
Let users correct extracted facts. Accepts fact and category fields. Marks the patch as user-declared.
/v1/quilt/{user_id}/patches/{patch_id}
Delete a patch
Permanently remove a single patch from the user's quilt.
/v1/quilt/{user_id}
Delete all user data (GDPR)
Complete data deletion. Removes all patches, entities, relationships, and cached data for a user.
/v1/quilt/{user_id}/rename-speaker
Rename an entity
Rename a speaker or entity across all patches, relationships, and the entity index. Useful when the system labels someone as "Speaker 4" and the user knows their real name.
/v1/projects/{user_id}
List user's projects
Returns all projects for a user with status and patch counts.
/v1/projects/{user_id}
Create a project
Create a named project. Patches and entities can be scoped to projects for organized context retrieval.
/v1/projects/{user_id}/{project_id}
Update or archive a project
Rename a project or change its status to archived. Archiving cascades to child patches.
/v1/enrich
Template-based context injection
Pass a prompt template with [[placeholder]] syntax. CQ substitutes values from the user's profile. Supports defaults via [[key|fallback]].
# Request { "user_id": "user-123", "template": "You are helping [[name|a user]], a [[role]]. They prefer [[communication_style]]. Active project: [[current_project|none]]." } # Response 200 { "enriched_prompt": "You are helping Sarah, a Product Manager. They prefer concise, direct answers. Active project: Widget 2.0.", "used_variables": ["name", "role", "communication_style", "current_project"], "missing_variables": [] }
/v1/profile/{user_id}
Get user's hydrated profile
Returns the cached user profile from Redis. Supports ?keys=key1,key2 to filter specific fields. Useful for building personalized UIs.
/v1/prewarm
Warm user cache
Hydrate a user's profile and entity index from PostgreSQL into Redis. Call at session start to ensure the first /v1/recall is a cache hit. Completes in <50ms.
/health
Health check
Returns service status and version. No authentication required.
A full admin API is available under /api/dashboard/ (requires X-Admin-Key header). Includes stats, user management, patch history, extraction metrics, cost tracking, pipeline testing, prompt management, and system health monitoring.
GET /stats
GET /users
GET /patches/recent
GET /metrics/cost
POST /test-pipeline
GET /health-check
GET /config
Interactive API documentation with request/response schemas, try-it-out capability, and full endpoint reference. Explore every endpoint live.
OpenRouter, OpenAI, Anthropic, Google Gemini, Ollama, vLLM, LiteLLM — anything OpenAI-compatible. Default extraction: Mistral Small 3.1 at $0.00009/call.
Events group by meeting_id and consolidate before extraction. Time-based (60 min) or context-budget (80%) triggers. One LLM call per batch.
Extraction pipeline: Picker (facts & entities), Stitcher (organization), Designer (communication profile), Cataloger (summary). Single-call or multi-role mode.
Send arbitrary key-value metadata with events. Group by meeting_id, ticket_id, repo, deal_id — whatever your app needs.
Apps can only read/write patches they created. Built-in access control ensures multi-app environments stay isolated.
Docker Compose for the full stack (API + PostgreSQL + Redis + optional pgAdmin). Self-host on your infra. GPU-optional.
Your app talks to your LLM directly. CQ is a side-channel for context, not a gateway. No single point of failure on your critical path.
Facts are connected into a knowledge graph, not dumped in a vector store. Traverse relationships to surface context that keyword search would miss.
CQ learns how users communicate — formality, directness, technical depth, tone. Your AI adapts its personality, not just its knowledge.
Apache 2.0 licensed. Self-host the full stack. No cloud dependency, no usage-based surprises. Your data stays on your infrastructure.
Works with any OpenAI-compatible API. Switch LLM providers without changing your memory layer. No vendor lock-in.
Users can see, edit, and delete everything CQ knows about them. Transparency isn't an afterthought — it's a core API endpoint.
Everything the AI remembers is visible in the user's quilt. No hidden data, no black boxes.
Full CRUD on all patches. Users edit incorrect facts, delete anything they want. It's their data.
Quilts are isolated per user. Never shared, never used for training. Self-host for complete control.
Old facts fade naturally. Completed tasks and stale context auto-archive after configurable TTLs. The quilt stays clean.
Context Quilt is open source and ready to deploy. Add persistent, graph-connected memory to your AI application today.