Architecture
Self-Evolving Knowledge
How goClaw's knowledge base works — Markdown files, FTS5 indexing, curiosity items, and the nightly research pipeline.
The self-evolving knowledge system is goClaw's core differentiator. It's not just a static knowledge base the agent queries — it's a system that actively identifies its own gaps and fills them overnight.
How it works
The knowledge system has three components:
- Knowledge base — Markdown files indexed with SQLite FTS5
- Curiosity queue — a list of gaps the agent has identified during execution
- Research pipeline — a nightly job that processes the curiosity queue and writes new knowledge
After 30 days of operation, an active agent typically has 60+ knowledge files it created itself, without any human intervention.
Knowledge base
Knowledge is stored as plain Markdown files in the ./knowledge/ directory.
knowledge/
├── company.md # Core company facts
├── products/
│ ├── platform.md
│ └── pricing.md
├── prospects/ # Agent-created: company research
│ ├── acme-corp.md
│ └── widget-co.md
├── objections/ # Agent-created: objection handling
│ ├── price-objection.md
│ └── competitor-comparison.md
└── procedures/ # Agent-created: learned procedures
└── outreach-sequence.md
Files the agent created are indistinguishable from files you created — they're all Markdown. You can edit, delete, or reorganize agent-generated files at any time.
File format
---
title: Acme Corp Research
created: 2026-03-01
updated: 2026-03-05
source: agent_research
tags: [prospect, saas, series-b]
confidence: 0.85
---
# Acme Corp
## Company Overview
Acme Corp is a B2B SaaS company focused on...
The YAML frontmatter is optional but helps the agent reason about file age, confidence, and source.
SQLite FTS5 indexing
All knowledge files are indexed into SQLite using FTS5 (Full-Text Search version 5). The agent retrieves knowledge using the knowledge_search MCP tool:
// Agent tool call
const results = await mcp.tool("knowledge_search", {
query: "how to handle price objections",
limit: 5,
});
FTS5 supports:
- Phrase search (
"price objection") - Boolean operators (
pricing AND competitor) - Proximity search
- Stemming and tokenization
Results are ranked by BM25 relevance. The top-N results are injected into the agent's context window.
Re-indexing
The knowledge index is rebuilt:
- On startup (full scan for new/modified files)
- After any research pipeline run
- On manual trigger via admin dashboard
Index rebuild takes ~200ms for 100 files, ~2s for 1,000 files.
Curiosity queue
When the agent encounters a gap during execution, it files a curiosity item using the knowledge_file_curiosity MCP tool:
await mcp.tool("knowledge_file_curiosity", {
question: "What are Acme Corp's main product features?",
context: "Contact is CTO at Acme Corp. Need company context before outreach.",
priority: "high",
});
Curiosity items are stored in data/curiosity.db:
interface CuriosityItem {
id: string;
question: string;
context: string;
priority: "low" | "medium" | "high";
status: "pending" | "processing" | "completed" | "failed";
result_file: string | null;
created_at: number;
processed_at: number | null;
attempts: number;
}
You can view pending curiosity items in the admin dashboard under Research Queue.
Nightly research pipeline
The research pipeline runs at 2am local time (configurable). For each pending curiosity item:
- Web search — query GPT-4o-mini with a web search prompt to find relevant sources
- Content retrieval — fetch top 3 source URLs
- LLM synthesis — Claude Sonnet summarizes and synthesizes the content
- Categorization — determine the appropriate knowledge directory
- Deduplication — check if similar content already exists (FTS5 similarity check)
- File creation — write a new Markdown knowledge file
- Generate follow-up questions — the agent identifies 1–3 new curiosity items from the research (curiosity cascades)
- Mark item complete
Pipeline configuration
research:
schedule: "0 2 * * *" # cron: 2am daily
batch_size: 20 # items per nightly run
max_attempts: 3 # retry failed items up to 3 times
cascade_depth: 2 # max levels of curiosity cascades
sources_per_query: 3 # URLs to fetch per query
Curiosity cascades
Each research session generates new questions. These are filed as new curiosity items with cascade_depth + 1. The cascade_depth setting prevents infinite loops — items at depth 2 don't generate further questions by default.
A 30-day agent lifecycle with cascade_depth=2:
- Day 1: 5 seed curiosity items from initial conversations
- Day 7: 18 knowledge files (5 × 3 cascades + 3 direct)
- Day 30: 60+ knowledge files across company, product, prospect, objection, procedure categories
Manual knowledge management
You don't have to wait for the agent to discover knowledge gaps. You can:
Add files directly:
# Drop a Markdown file in the knowledge directory
cp your-docs.md ./knowledge/products/pricing-deep-dive.md
# Trigger a manual re-index
npx @clawrm/cli knowledge reindex
Import from external sources:
npx @clawrm/cli knowledge import --url https://docs.acme.com
npx @clawrm/cli knowledge import --dir ./existing-docs
Edit agent-generated files: Open any file in the knowledge directory and edit it. Changes are picked up at next re-index.
Knowledge visibility in admin dashboard
The admin dashboard's Knowledge tab shows:
- All knowledge files with creation date, source (agent vs. human), and tag filter
- Search interface over the indexed knowledge base
- File preview and inline edit
- Research Queue tab showing pending curiosity items with estimated completion time
