blog

Why Can't AI Read Your Website? | The RAG Sitemap Concept Explained

Your site's content is complete, yet AI cannot read it, because it lacks a site map designed for LLMs. From why RAG does not have to depend on a vector database, to the entropy-reduction perspective, to how a RAG Harness lets even a 3B small model do the job, every angle lands on the same point: a well-structured website is already a low-entropy knowledge base, and RAG Sitemap simply reveals it, letting AI intuitively follow the existing order to the answer.

A humanoid robot in a scholar's robe points to a line in an open book, showing it to a human author at the desk who holds a quill, and the author leans in to look closely at that spot, with drafts and notes spread across the desk. The reversal of the two figures' roles is the key to this painting: the robot is not a student being examined but a copy editor giving the site a check-up. Its finger points to an unclear line of description, and that spot is the gap itself. The quill stays in the human's hand; the right to revise has not been handed over, and the AI only lets the person see where the writing is not clear enough. The book pulled half-out on the desk hints that this is not a large-scale rewrite but a point-by-point fine-tuning of titles, category descriptions, and article placement, all the moves a site owner makes in ordinary SEO.

Use an AI Chatbot to Train Your Site for SEO

You think you are training the AI to understand your site? It should be the other way around: use an AI chatbot to train your site. Every time the AI answers wrong, it is telling you which article's title or description is unclear, helping you check your site's SEO blind spots. RAG Sitemap drops the black-box vector store and reads the titles, categories, and descriptions you already wrote in WordPress, generating a plain-text site map for the AI. You only fix things in the admin the way you already do SEO, with no new tool to learn and no algorithm to read minds.

A Renaissance horseman hauls hard on the reins at the edge of a cliff, the horse's front hooves skidding to a halt a moment before the fall, solid ground behind and a deep dark gorge ahead. The horse represents the LLM's natural generative force, a high-entropy gallop in itself. The reins are the harness, a set of deliberately applied engineering constraints. The decisive instant happens at the entrance of each hop: if any one Sub Agent gives way, the whole chain of reasoning plunges into the gorge of accumulated error. The rider does not suppress the horse but takes the direction back into his own hands, which is also the role of the Diving Agent, holding the Master Sitemap to decide the dive point for the whole team. The solid ground behind corresponds to the static cache segment, where the prompt, Master Sitemap, and chunks stand unmoving.

RAG Harness Engineering

RAG Harness Engineering means every visitor question triggers not a single AI prompt, but three independent AI API calls: vision, retrieval, answer. Chaining multiple sub-agents normally risks each stage poisoning the next, but the Harness architecture hands every stage the visitor's original question and a clear view of the initial task goal, making it fundamentally immune to contamination. Accumulated noise is stopped at the entrance of every hop.

一個渺小、穿著樸素希臘長袍的人形機器人，手提一盞小燈，在一座龐大宏偉的石柱長廊中自信地往深處走去，廊柱朝遠方無止盡延伸。渺小的機器人是 Llama 3B 這種垃圾級小模型，手中的小燈只照亮自己腳下，是它有限的世界知識。但牡步伐自信，因為真正在引路的是周圍的石柱秩序，不是手裡的燈。柱列朝深處延伸，對應 master → category → post 的漸進式披露。模型小不要緊，秩序夠清楚的時候，每一 hop 都收斂成一道選擇題。這幅畫的主角不是機器人，是廊柱本身，能力強弱不是關鍵，結構正確才是。

A Small Language Model (SLM) Actually Understood an Entire Website

The small model Llama 3.2 3B is a language model with only 3B parameters, about as small as they come. Ask it a question and it can only answer from its 3B of training data. It does not know what your site says, does not know which articles you published, and knows nothing about the content you have built up recently. Using it to run website Q&A should have been a fantasy.

A small figure holds only a single flat paper map, trapped inside a huge multi-level labyrinth of stone arches and staircases, endless stairs extending up, down, and in every direction. The flat paper map is the essence of llms.txt: a catalog with only titles and no body, meant for flat reading. The three-dimensional maze is the knowledge structure of a real site, with floors, circulation routes, and interconnected depth. The figure is tiny, and the mismatch in scale between tool and environment is exactly the situation of an AI holding a 2D list against 3D content. Light enters from one side, but without the right map, light cannot replace structure. What RAG Sitemap sets out to solve is making this flat plan three-dimensional, walking down the three layers of master, category, and post.

The Good-Faith Limits of llms.txt

llms.txt is a sitemap designed for AI to read, but its limit is that it has only one layer, which is not enough for an organized, structured site. Its standard format is the site name as an H1, a short summary, and below that each line is a title: description pointing to one page. But it cannot tell whether a line is a category page, a standalone page, an article, or a product page; every line is treated as the same kind of thing. The intention is not wrong, and the goal is to make a site easier for AI to read. The problem is not the description but that it flattens the site into a single layer, destroying the site's original narrative power and content context.

A humanoid robot in a Greek tunic climbs a pre-carved stone spiral staircase inside a grand old library, moving toward the light above, with crumpled and ignored scraps of paper scattered on the floor. The spiral staircase is WordPress's existing categories and hierarchy; the carving was already there, not cut by this traveler. The robot climbing on foot matches RAG Sitemap retrieving directly along a ready-made path. The crumpled scraps on the floor are the reverse work of vectorization, tearing organized content back into fragments and reassembling them with cosine similarity. The orderly shelves are the low-entropy sediment humans lay down article by article, category by category, while running a site. The light above is the direction of the answer: the structure itself leads the way, and the model only has to understand and choose.

Why RAG Doesn't Need a Vector Database

A vector database is not a requirement for RAG; it is only one way to feed data to an AI. When data is inherently messy and lacks clear boundaries, vectorization helps a model guess semantic relevance from large amounts of text, and that has its value. But when content already has order, the question is no longer how to force relevance out of chaos, but how to let the AI see the most important interpretive clues first. Effective RAG does not have to slice the full text, compress it into vectors, and then guess the answer; it can instead organize content into a path the AI understands layer by layer, lowering contextual uncertainty first and then expanding the detail.

A scholar's hands are calibrating the brass rings of an armillary sphere, while behind them a turbulent black cloud condenses into the precise geometric order of the sphere itself. The black cloud is the LLM's original state, a naturally high-entropy string generator that knows every possibility, with every possibility existing at once. The brass rings are the layers of entropy reduction, tightening outward ring by ring from prompt to context to agent to harness, each layer compressing conditional entropy once. The hands represent external work: order does not appear on its own; it is the result of humans imposing structure. The armillary sphere is a finite, knowable model of the cosmos, and an LLM constrained by engineering is the same, no longer a boundless language space but a predictable instrument.

AI Entropy-Reduction

AI entropy-reduction engineering is the umbrella term for every design that gets an LLM moving. From prompt to context to agent to harness, all of them narrow the range of prediction and lower the uncertainty of an answer; only the scope of their influence differs. This is because an LLM operates on a naturally high-entropy linguistic medium, and the core engineering of an AI application is to perform entropy reduction through structured input and external knowledge, lowering uncertainty and improving output quality.

A group of people circle a central light, each receiving and cupping a flame of their own, light spreading from one place into many separate palms. The central light is the cloud API of the past decade, where every inference had to come back and pay the bill. The light passed to each pair of hands corresponds to the trajectory of the NPU, chip is model, and the Chrome Prompt API, with inference moved back onto the visitor's own device. Each flame is close in size, meaning the edge small model is already capable enough to carry a site's navigation task. The posture of hands cupping a flame is privacy and non-disclosure; privacy holds naturally under this architecture. The distances between people are even: this is not a new center replacing the old one but the center dissolving entirely.

The End Goal: Moving Compute onto the User's Device

"AI-on-Chip" means that when every device has a small AI model carved into a chip, the model is no longer software that must be loaded but a compute chip always on standby. The LLM inference an application needs can run locally on the visitor's device, bringing the site owner's AI compute cost to zero. This is the end goal of RAG Chatbot.

vLLM

高並發、低延遲的生產級 LLM 推理引擎

透過 vLLM 也能在自己的家用電腦上部署本地 LLM，這裡收錄可直接複製貼上的一鍵啟動範本，以及為什麼選 vLLM 作為自建首選、硬體規格建議、Gemma / Qwen / Mistral 等主流模型的部署實作，徹底擺脫 OpenAI、Gemini、Claude 的 API 成本。

Use an AI Chatbot to Train Your Site for SEO

你以為是你在訓練 AI 讀懂網站？其實應該反過來用 AI Chatbot 訓練你的網站，因為 AI 每一次答錯，都是在告訴你哪一篇文章的標題或描述沒寫清楚，幫你檢查網站的 SEO...

RAG Harness Engineering

RAG Harness Engineering 讓訪客的每個提問背後不只是單純的一次 AI 提示詞呼叫，而是看圖、檢索、回答，三段獨立的 AI API。本來多個 Sub...

A Small Language Model (SLM) Actually Understood an Entire Website

The Good-Faith Limits of llms.txt

Why RAG Doesn't Need a Vector Database

向量資料庫不是 RAG 的必要條件，它只是其中一種把資料餵給 AI 的方式。當資料本來是混亂的、缺乏清楚邊界的，向量化可以幫助模型從大量文字中猜測語意相關性，這種做法有它的價值。但如果內容本來就有秩序，問題就不再是「怎麼從混亂中硬算相關」，而是「怎麼讓 AI 先看到最重要的判讀線索」。真正有效的 RAG，不一定是先把全文切碎、壓成向量再回頭猜答案；也可以是先把內容整理成 AI...

AI Entropy-Reduction

AI 熵減工程是所有讓 LLM 動起來的設計的總稱，從 prompt、context、agent 到 harness 都是在收窄預測的可能性、降低回答的不確定性，只是影響範圍的大小不同。這是因為 LLM...

The End Goal: Moving Compute onto the User's Device

「晶片即模型」的意思是，當每台裝置都內建一顆刻進晶片的 AI 小模型，模型不再是需要載入的軟體，而是隨時待命的運算晶片，應用程式所需的 LLM 推理可直接在訪客裝置端就地完成，讓網站主的 AI 運算成本歸零，這正是 RAG Chatbot...

Wordpress

程式碼片段庫

有些問題，不值得為它裝一整個外掛。很多時候你只是想關掉一個預設行為、加一個小功能，這些幾行程式碼就能解決的事，不需要動用一個繁重的外掛。WordPress 外掛生態雖然龐大，但每多裝一個，就多一層維護、一點拖累、一個風險。