Architecture

Self-Evolving Knowledge

How goClaw's knowledge base works — Markdown files, FTS5 indexing, curiosity items, and the nightly research pipeline.

The self-evolving knowledge system is goClaw's core differentiator. It's not just a static knowledge base the agent queries — it's a system that actively identifies its own gaps and fills them overnight.

How it works

The knowledge system has three components:

Knowledge base — Markdown files indexed with SQLite FTS5
Curiosity queue — a list of gaps the agent has identified during execution
Research pipeline — a nightly job that processes the curiosity queue and writes new knowledge

After 30 days of operation, an active agent typically has 60+ knowledge files it created itself, without any human intervention.

Knowledge base

Knowledge is stored as plain Markdown files in the ./knowledge/ directory.

knowledge/
├── company.md              # Core company facts
├── products/
│   ├── platform.md
│   └── pricing.md
├── prospects/              # Agent-created: company research
│   ├── acme-corp.md
│   └── widget-co.md
├── objections/             # Agent-created: objection handling
│   ├── price-objection.md
│   └── competitor-comparison.md
└── procedures/             # Agent-created: learned procedures
    └── outreach-sequence.md

Files the agent created are indistinguishable from files you created — they're all Markdown. You can edit, delete, or reorganize agent-generated files at any time.

File format

---
title: Acme Corp Research
created: 2026-03-01
updated: 2026-03-05
source: agent_research
tags: [prospect, saas, series-b]
confidence: 0.85
---

# Acme Corp

## Company Overview
Acme Corp is a B2B SaaS company focused on...

The YAML frontmatter is optional but helps the agent reason about file age, confidence, and source.

SQLite FTS5 indexing

All knowledge files are indexed into SQLite using FTS5 (Full-Text Search version 5). The agent retrieves knowledge using the knowledge_search MCP tool:

// Agent tool call
const results = await mcp.tool("knowledge_search", {
  query: "how to handle price objections",
  limit: 5,
});

FTS5 supports:

Phrase search ("price objection")
Boolean operators (pricing AND competitor)
Proximity search
Stemming and tokenization

Results are ranked by BM25 relevance. The top-N results are injected into the agent's context window.

Re-indexing

The knowledge index is rebuilt:

On startup (full scan for new/modified files)
After any research pipeline run
On manual trigger via admin dashboard

Index rebuild takes ~200ms for 100 files, ~2s for 1,000 files.

Curiosity queue

When the agent encounters a gap during execution, it files a curiosity item using the knowledge_file_curiosity MCP tool:

await mcp.tool("knowledge_file_curiosity", {
  question: "What are Acme Corp's main product features?",
  context: "Contact is CTO at Acme Corp. Need company context before outreach.",
  priority: "high",
});

Curiosity items are stored in data/curiosity.db:

interface CuriosityItem {
  id: string;
  question: string;
  context: string;
  priority: "low" | "medium" | "high";
  status: "pending" | "processing" | "completed" | "failed";
  result_file: string | null;
  created_at: number;
  processed_at: number | null;
  attempts: number;
}

You can view pending curiosity items in the admin dashboard under Research Queue.

Nightly research pipeline

The research pipeline runs at 2am local time (configurable). For each pending curiosity item:

Web search — query GPT-4o-mini with a web search prompt to find relevant sources
Content retrieval — fetch top 3 source URLs
LLM synthesis — Claude Sonnet summarizes and synthesizes the content
Categorization — determine the appropriate knowledge directory
Deduplication — check if similar content already exists (FTS5 similarity check)
File creation — write a new Markdown knowledge file
Generate follow-up questions — the agent identifies 1–3 new curiosity items from the research (curiosity cascades)
Mark item complete

Pipeline configuration

research:
  schedule: "0 2 * * *"    # cron: 2am daily
  batch_size: 20            # items per nightly run
  max_attempts: 3           # retry failed items up to 3 times
  cascade_depth: 2          # max levels of curiosity cascades
  sources_per_query: 3      # URLs to fetch per query

Curiosity cascades

Each research session generates new questions. These are filed as new curiosity items with cascade_depth + 1. The cascade_depth setting prevents infinite loops — items at depth 2 don't generate further questions by default.

A 30-day agent lifecycle with cascade_depth=2:

Day 1: 5 seed curiosity items from initial conversations
Day 7: 18 knowledge files (5 × 3 cascades + 3 direct)
Day 30: 60+ knowledge files across company, product, prospect, objection, procedure categories

Manual knowledge management

You don't have to wait for the agent to discover knowledge gaps. You can:

Add files directly:

# Drop a Markdown file in the knowledge directory
cp your-docs.md ./knowledge/products/pricing-deep-dive.md

# Trigger a manual re-index
npx @clawrm/cli knowledge reindex

Import from external sources:

npx @clawrm/cli knowledge import --url https://docs.acme.com
npx @clawrm/cli knowledge import --dir ./existing-docs

Edit agent-generated files: Open any file in the knowledge directory and edit it. Changes are picked up at next re-index.

Knowledge visibility in admin dashboard

The admin dashboard's Knowledge tab shows:

All knowledge files with creation date, source (agent vs. human), and tag filter
Search interface over the indexed knowledge base
File preview and inline edit
Research Queue tab showing pending curiosity items with estimated completion time

PreviousMulti-Channel Communications NextPermission System