Map

get

Truth

Vectors

find

Scraps

A RAG Sitemap is a structured, AI-readable knowledge index designed for AI retrieval. It is not sitemap.xml and not a vector database. It is a traceable context, a top-level navigation document that marks the boundaries and relationships between content groups and describes the whole knowledge landscape of a site, so AI walks the structure toward the answer instead of gambling on semantic similarity.

Use an AI Chatbot to Train Your Site for SEO

You think you are training the AI to understand your site? It should be the other way around: use an AI chatbot to train your site. Every time the AI answers wrong, it is telling you which article's title or description is unclear, helping you check your site's SEO blind spots. RAG Sitemap drops the black-box vector store and reads the titles, categories, and descriptions you already wrote in WordPress, generating a plain-text site map for the AI. You only fix things in the admin the way you already do SEO, with no new tool to learn and no algorithm to read minds.

RAG Harness Engineering

RAG Harness Engineering means every visitor question triggers not a single AI prompt, but three independent AI API calls: vision, retrieval, answer. Chaining multiple sub-agents normally risks each stage poisoning the next, but the Harness architecture hands every stage the visitor's original question and a clear view of the initial task goal, making it fundamentally immune to contamination. Accumulated noise is stopped at the entrance of every hop.

A Small Language Model (SLM) Actually Understood an Entire Website

The small model Llama 3.2 3B is a language model with only 3B parameters, about as small as they come. Ask it a question and it can only answer from its 3B of training data. It does not know what your site says, does not know which articles you published, and knows nothing about the content you have built up recently. Using it to run website Q&A should have been a fantasy.

The Good-Faith Limits of llms.txt

llms.txt is a sitemap designed for AI to read, but its limit is that it has only one layer, which is not enough for an organized, structured site. Its standard format is the site name as an H1, a short summary, and below that each line is a title: description pointing to one page. But it cannot tell whether a line is a category page, a standalone page, an article, or a product page; every line is treated as the same kind of thing. The intention is not wrong, and the goal is to make a site easier for AI to read. The problem is not the description but that it flattens the site into a single layer, destroying the site's original narrative power and content context.

Progressive Disclosure: An Indexing Philosophy That Has Never Changed

JSON-LD is read by search engines, Claude's SKILL.md is read by AI agents, and a RAG Sitemap is read by the on-site AI chatbot. The reader changes all the way from crawler to LLM, yet all three systems face the same challenge of quickly deciding what is most relevant to the current need, and all three converge on the same three-layer structure: title, description, content.

Lightweight scan Precise deep dive
Google Search
Web-wide crawling and ranking
1
Title
Schema.org name
2
Description
Schema.org description
3
Content
Schema.org articleBody
Claude SKILL.md
Agent task context
1
Title
YAML name
2
Description
YAML description
3
Content
SKILL.md
RAG Sitemap
On-site knowledge retrieval
1
Title
Entry Title
2
Description
Entry Description
3
Content
the entry's corresponding chunk
{
  "@context": "https://schema.org",
  "@type": "Article",

  "name": "iPhone Review",

  "url": "example.com/iphone",

  "description":
    "In-depth review...",

  "articleBody":
    "Overall performance..."
}
---
name: pdf-processing

description: Extract text and
  tables from PDF files
---

# PDF Processing

Call process_pdf(filepath)
to start processing...
======
Title: iPhone Review

Link: example.com/iphone

Description: Complete review of
  iPhone's core features
======

Content:
  Overall performance,
  good battery life...

The three-layer structure of Perspective One holds inside a single unit of content. Pull the lens back to the whole file system and the same three layers appear again: index, category, file. Progressive disclosure does not happen only inside an article; it also decides how the entire site is organized.

Top level Leaf nodes
XML Sitemap
Site directory structure
1
Index
sitemap.xml
2
Category
/products/, /blog/ and other subdirectories
3
File
Individual HTML pages
Claude SKILL.md
Skill folder
1
Index
SKILL.md
2
Category
scripts/
3
File
scripts/*.py executables
RAG Sitemap
RAG directory structure
1
Index
master-sitemap.txt
2
Category
category-list/
3
File
post_*.txt、page_*.txt
example.com/
├── sitemap.xml     # index
├── products/       # category
│   ├── cat-a/
│   │   └── item.html
│   └── cat-b/
└── blog/           # category
    └── post.html
pdf/
├── SKILL.md        # index
├── FORMS.md
├── reference.md
├── examples.md
└── scripts/        # category
    ├── analyze.py
    ├── fill.py
    └── validate.py
rag-sitemap/default/
├── master-sitemap.txt # index
├── category-list/  # category
│   ├── cat_x.txt
│   └── cat_y.txt
├── post-chunks/    # category
│   ├── post_x.txt
│   └── post_y.txt
└── page-chunks/    # category
    └── page_x.txt