Compile a markdown wiki into a portable bundle your AI agent queries over MCP — pre-computed, source-cited answers in ~2 s, 78% fewer tokens. Plug-and-play in Claude Desktop, Cursor, Cline.
wiki-embedded turns a folder of markdown wiki pages into a knowledge base your AI agent can query as if it were an API. The compiler parses pages + [[wikilinks]], derives a thesis (the wiki's purpose lens), pre-computes answer chunks with an LLM teacher, and embeds everything with multilingual E5 — emitting one portable compiled_wiki.zip (~5 MB).
The MCP server (pip install wiki-embedded-mcp) loads that bundle and exposes 9 tools to Claude Desktop, Cursor, or Cline: semantic search, ready-made cited answers, page reads, backlinks, and crossref-graph traversal. Pre-computed chunks mean query_wiki_answer returns a source-cited answer at embedder speed — no per-query LLM call.
Extracted from the internal toolchain at 42rows.com, an AI sales-intelligence platform that ships agents grounded on customer-specific wikis. Three components in one repo: the compiler (Apify actor + Docker), the MCP server (pip + Docker), and an optional polars-runner analytics side-channel with a property-based-testing oracle.
pip install 'wiki-embedded-mcp[cloud]' wiki-embedded-mcp --wiki ./my-wiki # compile + serve a local dir wiki-embedded-mcp --compiled-wiki https://.../compiled_wiki.zip # instant boot Three install paths. pip is the default; Docker ships a non-root image with the MCP server bundled; clone-from-source uses uv workspaces for contributors.
# MCP server, cloud embeddings (Pinecone Inference)
pip install 'wiki-embedded-mcp[cloud]'
# …or bundle local embeddings (sentence-transformers + torch)
pip install 'wiki-embedded-mcp[all]'
# Serve a pre-compiled bundle — instant boot, no re-embedding
export PINECONE_API_KEY="..."
wiki-embedded-mcp --compiled-wiki https://.../compiled_wiki.zip
# …or point at a local markdown dir (compiles on first boot)
wiki-embedded-mcp --wiki ./my-wikiWire the compiled bundle into your AI client. Drop this into claude_desktop_config.json (or Cursor / Cline), restart, and your agent queries
the wiki over MCP — pre-computed cited answers, no glue code.
// claude_desktop_config.json (or ~/.cursor/mcp.json, cline_mcp_settings.json)
{
"mcpServers": {
"wiki-embedded": {
"command": "wiki-embedded-mcp",
"args": ["--compiled-wiki", "https://.../compiled_wiki.zip"],
"env": { "PINECONE_API_KEY": "..." }
}
}
}
// Restart the client, then ask your agent:
// "use wiki-embedded to find the top prospects"
// → it calls query_wiki_answer(...) and gets a pre-computed, source-cited answer.Measured on a real 1,851-page Italian sales wiki: 20 strategic queries (IT + EN), same LLM (Gemini 2.5 Flash), answers graded against author-written ground truth. Scripts and raw results live in benchmarks/ — reproduce them or run your own.
20 strategic queries (Italian + English) on a real 1,851-page sales wiki. wiki-embedded = top-5 E5-cosine retrieval + Gemini 2.5 Flash synthesis; baseline = grep -rli on the raw markdown, then the same LLM on every matched file (capped at 20).
| Filesystem grep | wiki-embedded | Δ | |
|---|---|---|---|
| Input tokens to LLM | 9,255 mean | 2,035 mean | −78% |
| End-to-end latency (at LLM speed) | ~9.1 s | ~2.3 s | ~4× faster |
| Retrieval latency (median) | 69 ms | 322 ms | grep — honest |
Feeding 5× more text does not help: the baseline LLM gets lost in noise and more often replies "not enough information". Graded by LLM-as-judge against author-written ground truth on the valid samples.
| Filesystem grep | wiki-embedded | Δ | |
|---|---|---|---|
| Answer quality (LLM-as-judge wins) | 3 | 5 | wiki-embedded |
| "Not enough information" punts | frequent | rare | wiki-embedded |
| Source citations ([[slug]]) | none | inline | wiki-embedded |
When wiki-embedded does not help. Wiki ≤ 30 pages and your agent already has filesystem
access (Claude Code locally) → grep -rli is faster and cheaper. wiki-embedded earns its
keep when (a) the wiki ships to consumers without filesystem access (Claude Desktop, Cursor,
Cline, hosted apps), (b) queries are semantic rather than keyword-matched, or (c) the same wiki
is queried many times.
Stable IDs across recompiles. Idempotent upserts. Every YAML frontmatter field on the page
lands in metadata as-is — your vector DB's metadata filter can target any of them.
{
"id": "companies/acme-logistics#0", // stable across recompiles
"text": "passage: Acme Logistics S.p.A. ...", // E5 expects this prefix
"embedding": [0.012, -0.089, /* ... */], // 1024-dim float, or null
"metadata": {
"slug": "companies/acme-logistics",
"title": "Acme Logistics S.p.A.",
"kind": "company", // any frontmatter field
"confidence": 0.88,
"wikilinks_out": ["segments/retail-warehouse", "products/wms-suite"],
"char_count": 2147
}
}wiki-embedded was extracted from the internal toolchain at 42rows.com, an AI sales-intelligence platform that ships agents grounded on customer-specific markdown wikis. We needed our agents to query a large wiki without feeding the whole thing to the LLM every turn — so we built a compiler that pre-computes cited answers and an MCP server that serves them. We open-sourced both. If it is useful to you, a star on GitHub helps people find it; a look at what 42rows does helps us.
No. wiki-embedded is two focused pieces — a compiler (markdown wiki → portable bundle) and an MCP server that serves it — not an orchestration framework. It exposes its wiki to any MCP client (Claude Desktop, Cursor, Cline); use whatever orchestrator you like alongside it.
At compile time an LLM teacher writes thesis-conditional "ideal answers" for likely query archetypes, each citing the evidence pages by [[slug]]. At query time query_wiki_answer returns the best one at embedder speed — common questions are answered without a per-query LLM call. query_wiki and query_wiki_pages still return raw passages when you want to synthesize yourself.
No. Pinecone Inference is the default cloud embedder (fast, no local model), but the [all] extra bundles sentence-transformers so the E5 multilingual model runs locally on CPU — no API key, no network. The bundle records which embedder it used so the server matches it.
Yes. The default model is E5 multilingual large (1024-dim) — Italian, English, German, French, Spanish, Portuguese and 90+ languages, cross-lingual out of the box. The published benchmark wiki is Italian, queried in both Italian and English.
It is an MCP server. Add a small "wiki-embedded" entry to claude_desktop_config.json (or the Cursor / Cline config) pointing at your compiled_wiki.zip, restart, and the agent gets 9 tools: query_wiki, query_wiki_answer, read_wiki_page, get_wiki_backlinks, get_wiki_graph, get_thesis, and more.
Open source, MIT, Python 3.10+. Issues and pull requests are open — we read all of them.