/rag-sitemap/ │ ├── default/ ← Main Language │ │ │ ├── master-sitemap.txt ← sitemap │ │ ├─ ====== │ │ ├─ Page Title / Link / Description │ │ ├─ Rag_Item_Link: (page_aaa.txt URL) │ │ ├─ ====== │ │ ├─ Page Title / Link / Description │ │ ├─ Rag_Category_Link: (category_list_z.txt URL) │ │ └─ ====== │ │ │ ├── category-list/ │ │ ├── category_list_x.txt │ │ ├── category_list_y.txt │ │ └── category_list_z.txt │ │ │ │ │ └── Each Category_List.txt │ │ ├─ ====== │ │ ├─ Category Title / Link / Description │ │ ├─ Rag_Item_Link: (post_xxx.txt URL) │ │ └─ ====== │ │ │ ├── post-chunks/ │ │ ├── post_2993.txt │ │ ├── post_2999.txt │ │ └── post_3105.txt │ │ │ │ │ └── Each Single_Post.txt │ │ ├─ Title: … │ │ ├─ Link: … │ │ ├─ Date: … │ │ └─ Content: … │ │ │ └── page-chunks/ │ ├── page_aaaa.txt │ └── page_bbbb.txt │ │ │ └── Each Page_Post.txt │ ├─ Title: … │ ├─ Link: … │ ├─ Date: … │ └─ Content: … │ ├── jp/ │ ├── master-sitemap.txt │ ├── category-list/ │ ├── post-chunks/ │ └── page-chunks/ │ └── ko/ ├── master-sitemap.txt ├── category-list/ ├── post-chunks/ └── page-chunks/

一個渺小、穿著樸素希臘長袍的人形機器人，手提一盞小燈，在一座龐大宏偉的石柱長廊中自信地往深處走去，廊柱朝遠方無止盡延伸。渺小的機器人是 Llama 3B 這種垃圾級小模型，手中的小燈只照亮自己腳下，是它有限的世界知識。但牡步伐自信，因為真正在引路的是周圍的石柱秩序，不是手裡的燈。柱列朝深處延伸，對應 master → category → post 的漸進式披露。模型小不要緊，秩序夠清楚的時候，每一 hop 都收斂成一道選擇題。這幅畫的主角不是機器人，是廊柱本身，能力強弱不是關鍵，結構正確才是。

A Small Language Model (SLM) Actually Understood an Entire Website

The small model Llama 3.2 3B is a language model with only 3B parameters, about as small as they come. Ask it a question and it can only answer from its 3B of training data. It does not know what your site says, does not know which articles you published, and knows nothing about the content you have built up recently. Using it to run website Q&A should have been a fantasy.

A Renaissance horseman hauls hard on the reins at the edge of a cliff, the horse's front hooves skidding to a halt a moment before the fall, solid ground behind and a deep dark gorge ahead. The horse represents the LLM's natural generative force, a high-entropy gallop in itself. The reins are the harness, a set of deliberately applied engineering constraints. The decisive instant happens at the entrance of each hop: if any one Sub Agent gives way, the whole chain of reasoning plunges into the gorge of accumulated error. The rider does not suppress the horse but takes the direction back into his own hands, which is also the role of the Diving Agent, holding the Master Sitemap to decide the dive point for the whole team. The solid ground behind corresponds to the static cache segment, where the prompt, Master Sitemap, and chunks stand unmoving.

RAG Harness Engineering

RAG Harness Engineering means every visitor question triggers not a single AI prompt, but three independent AI API calls: vision, retrieval, answer. Chaining multiple sub-agents normally risks each stage poisoning the next, but the Harness architecture hands every stage the visitor's original question and a clear view of the initial task goal, making it fundamentally immune to contamination. Accumulated noise is stopped at the entrance of every hop.

A humanoid robot in a Greek tunic climbs a pre-carved stone spiral staircase inside a grand old library, moving toward the light above, with crumpled and ignored scraps of paper scattered on the floor. The spiral staircase is WordPress's existing categories and hierarchy; the carving was already there, not cut by this traveler. The robot climbing on foot matches RAG Sitemap retrieving directly along a ready-made path. The crumpled scraps on the floor are the reverse work of vectorization, tearing organized content back into fragments and reassembling them with cosine similarity. The orderly shelves are the low-entropy sediment humans lay down article by article, category by category, while running a site. The light above is the direction of the answer: the structure itself leads the way, and the model only has to understand and choose.

Why RAG Doesn't Need a Vector Database

A vector database is not a requirement for RAG; it is only one way to feed data to an AI. When data is inherently messy and lacks clear boundaries, vectorization helps a model guess semantic relevance from large amounts of text, and that has its value. But when content already has order, the question is no longer how to force relevance out of chaos, but how to let the AI see the most important interpretive clues first. Effective RAG does not have to slice the full text, compress it into vectors, and then guess the answer; it can instead organize content into a path the AI understands layer by layer, lowering contextual uncertainty first and then expanding the detail.

llms.txt

The Good-Faith Limits of llms.txt: Why a Link List Isn't Enough

From 2D to 3D: How RAG Sitemap Rebuilds a Site's Knowledge Skeleton

A File Architecture Designed for AI Retrieval: Transparent, Layered, Multilingual

Related articles

A Small Language Model (SLM) Actually Understood an Entire Website

RAG Harness Engineering

Why RAG Doesn't Need a Vector Database