wiki-embedded

Compile a markdown wiki into a portable bundle your AI agent queries over MCP — pre-computed, source-cited answers in ~2 s, 78% fewer tokens. Plug-and-play in Claude Desktop, Cursor, Cline.

Open source · MIT Python 3.10+ pip install wiki-embedded-mcp instant boot from a bundle · ~2 s per answered query

View on GitHub → Read the benchmark

What it does

wiki-embedded turns a folder of markdown wiki pages into a knowledge base your AI agent can query as if it were an API. The compiler parses pages + [[wikilinks]], derives a thesis (the wiki's purpose lens), pre-computes answer chunks with an LLM teacher, and embeds everything with multilingual E5 — emitting one portable compiled_wiki.zip (~5 MB).

The MCP server (pip install wiki-embedded-mcp) loads that bundle and exposes 9 tools to Claude Desktop, Cursor, or Cline: semantic search, ready-made cited answers, page reads, backlinks, and crossref-graph traversal. Pre-computed chunks mean query_wiki_answer returns a source-cited answer at embedder speed — no per-query LLM call.

Extracted from the internal toolchain at 42rows.com, an AI sales-intelligence platform that ships agents grounded on customer-specific wikis. Three components in one repo: the compiler (Apify actor + Docker), the MCP server (pip + Docker), and an optional polars-runner analytics side-channel with a property-based-testing oracle.

What's different

01 Pre-computed answer chunks, not just passages. An offline LLM teacher synthesizes thesis-conditional "ideal answers" that cite their evidence by [[slug]]. At query time query_wiki_answer returns one ready, cited answer at embedder speed — the LLM is only in the loop for optional synthesis, not every question.
02 One portable bundle, MCP-native. The whole wiki compiles to a single ~5 MB compiled_wiki.zip you ship to colleagues or production; the server serves it over the Model Context Protocol so Claude Desktop / Cursor / Cline query it directly. Incremental recompiles re-embed only what changed.
03 Structure preserved, not shredded. One chunk per author-curated page (no token-window splitting that amplifies source bias), every YAML frontmatter field kept as filterable metadata, and [[wikilinks]] parsed into a crossref graph exposed via get_wiki_backlinks / get_wiki_graph.

Inputs

Markdown wiki — local dir, .zip, or GitHub archive URL
…or a pre-compiled compiled_wiki.zip (URL or path)

Outputs

compiled_wiki.zip (~5 MB portable bundle)
MCP server — 9 tools over stdio JSON-RPC
Docker image (ghcr.io)

Example commands

01 pip install 'wiki-embedded-mcp[cloud]'

02 wiki-embedded-mcp --wiki ./my-wiki # compile + serve a local dir

03 wiki-embedded-mcp --compiled-wiki https://.../compiled_wiki.zip # instant boot

Install

Three install paths. pip is the default; Docker ships a non-root image with the MCP server bundled; clone-from-source uses uv workspaces for contributors.

# MCP server, cloud embeddings (Pinecone Inference)
pip install 'wiki-embedded-mcp[cloud]'

# …or bundle local embeddings (sentence-transformers + torch)
pip install 'wiki-embedded-mcp[all]'

# Serve a pre-compiled bundle — instant boot, no re-embedding
export PINECONE_API_KEY="..."
wiki-embedded-mcp --compiled-wiki https://.../compiled_wiki.zip

# …or point at a local markdown dir (compiles on first boot)
wiki-embedded-mcp --wiki ./my-wiki

Quick start

Wire the compiled bundle into your AI client. Drop this into claude_desktop_config.json (or Cursor / Cline), restart, and your agent queries the wiki over MCP — pre-computed cited answers, no glue code.

// claude_desktop_config.json  (or ~/.cursor/mcp.json, cline_mcp_settings.json)
{
  "mcpServers": {
    "wiki-embedded": {
      "command": "wiki-embedded-mcp",
      "args": ["--compiled-wiki", "https://.../compiled_wiki.zip"],
      "env": { "PINECONE_API_KEY": "..." }
    }
  }
}

// Restart the client, then ask your agent:
//   "use wiki-embedded to find the top prospects"
// → it calls query_wiki_answer(...) and gets a pre-computed, source-cited answer.

Benchmark

Measured on a real 1,851-page Italian sales wiki: 20 strategic queries (IT + EN), same LLM (Gemini 2.5 Flash), answers graded against author-written ground truth. Scripts and raw results live in benchmarks/ — reproduce them or run your own.

Tokens & latency — 20 queries on a 1,851-page wiki

20 strategic queries (Italian + English) on a real 1,851-page sales wiki. wiki-embedded = top-5 E5-cosine retrieval + Gemini 2.5 Flash synthesis; baseline = grep -rli on the raw markdown, then the same LLM on every matched file (capped at 20).

	Filesystem grep	wiki-embedded	Δ
Input tokens to LLM	9,255 mean	2,035 mean	−78%
End-to-end latency (at LLM speed)	~9.1 s	~2.3 s	~4× faster
Retrieval latency (median)	69 ms	322 ms	grep — honest

Answer quality — parity at a fraction of the tokens

Feeding 5× more text does not improve answers: across 20 queries, an LLM-as-judge (order-randomised) called it 5 wins for wiki-embedded, 3 for grep, and 12 ties — i.e. roughly parity, while wiki-embedded used 78% fewer tokens. The honest takeaway is the token/latency saving, not a large quality gap.

	Filesystem grep	wiki-embedded	Δ
LLM-as-judge verdict (n=20)	3 wins	5 wins	12 ties — parity
"Not enough information" punts	more often	rare	wiki-embedded
Source citations ([[slug]])	none	inline	wiki-embedded

When wiki-embedded does not help. Wiki ≤ 30 pages and your agent already has filesystem access (Claude Code locally) → grep -rli is faster and cheaper. wiki-embedded earns its keep when (a) the wiki ships to consumers without filesystem access (Claude Desktop, Cursor, Cline, hosted apps), (b) queries are semantic rather than keyword-matched, or (c) the same wiki is queried many times.

What's in each chunk

Stable IDs across recompiles. Idempotent upserts. Every YAML frontmatter field on the page lands in metadata as-is — your vector DB's metadata filter can target any of them.

{
  "id":       "companies/acme-logistics#0",       // stable across recompiles
  "text":     "passage: Acme Logistics S.p.A. ...", // E5 expects this prefix
  "embedding": [0.012, -0.089, /* ... */],        // 1024-dim float, or null
  "metadata": {
    "slug":          "companies/acme-logistics",
    "title":         "Acme Logistics S.p.A.",
    "kind":          "company",                   // any frontmatter field
    "confidence":    0.88,
    "wikilinks_out": ["segments/retail-warehouse", "products/wms-suite"],
    "char_count":    2147
  }
}

Why we built this

wiki-embedded was extracted from the internal toolchain at 42rows.com, an AI sales-intelligence platform that ships agents grounded on customer-specific markdown wikis. We needed our agents to query a large wiki without feeding the whole thing to the LLM every turn — so we built a compiler that pre-computes cited answers and an MCP server that serves them. We open-sourced both. If it is useful to you, a star on GitHub helps people find it; a look at what 42rows does helps us.

FAQ

Is this another LangChain or LlamaIndex?

No. wiki-embedded is two focused pieces — a compiler (markdown wiki → portable bundle) and an MCP server that serves it — not an orchestration framework. It exposes its wiki to any MCP client (Claude Desktop, Cursor, Cline); use whatever orchestrator you like alongside it.

What are "pre-computed answer chunks"?

At compile time an LLM teacher writes thesis-conditional "ideal answers" for likely query archetypes, each citing the evidence pages by [[slug]]. At query time query_wiki_answer returns the best one at embedder speed — common questions are answered without a per-query LLM call. query_wiki and query_wiki_pages still return raw passages when you want to synthesize yourself.

Do I have to use Pinecone?

No. Pinecone Inference is the default cloud embedder (fast, no local model), but the [all] extra bundles sentence-transformers so the E5 multilingual model runs locally on CPU — no API key, no network. The bundle records which embedder it used so the server matches it.

Does it work with non-English wikis?

Yes. The default model is E5 multilingual large (1024-dim) — Italian, English, German, French, Spanish, Portuguese and 90+ languages, cross-lingual out of the box. The published benchmark wiki is Italian, queried in both Italian and English.

How does it connect to my AI assistant?

It is an MCP server. Add a small "wiki-embedded" entry to claude_desktop_config.json (or the Cursor / Cline config) pointing at your compiled_wiki.zip, restart, and the agent gets 9 tools: query_wiki, query_wiki_answer, read_wiki_page, get_wiki_backlinks, get_wiki_graph, get_thesis, and more.

Install it. Star it. Tell us what breaks.

Open source, MIT, Python 3.10+. Issues and pull requests are open — we read all of them.

View on GitHub → See 42rows.com