Vector Database

Why RAG Doesn't Need a Vector Database

A vector database is not a requirement for RAG; it is only one way to feed data to an AI. When data is inherently messy and lacks clear boundaries, vectorization helps a model guess semantic relevance from large amounts of text, and that has its value. But when content already has order, the question is no longer how to force relevance out of chaos, but how to let the AI see the most important interpretive clues first. Effective RAG does not have to slice the full text, compress it into vectors, and then guess the answer; it can instead organize content into a path the AI understands layer by layer, lowering contextual uncertainty first and then expanding the detail.

WordPress is exactly the kind of site architecture that already has order. The categories, titles, and summaries you create while running your site organized your content into a clear hierarchy long ago. Breaking it apart, compressing it into vectors, and reassembling it by similarity only redoes that finished organization in mathematical form. The truth is you do not need to build a vector database for an AI to understand your site. All you need is to let the AI see the library you already built.

A Three-Dimensional Library Built on the RAG Sitemap Blueprint

A RAG agent does not have to read the whole library, nor break the collection into fragments and compare them by similarity. It walks in following the signs: it reads the hanging sign over a section to find the broad category its target sits in, reads the labels on the side of the shelf to tell the subcategories apart, scans the title and description of each book on the shelf, and only then pulls out the one it actually needs. This progressive-disclosure path is the RAG Sitemap, and even a small model can walk it.

RAG Sitemap · Progressive Disclosure

The Vector Database Does One Extra Thing

Most WordPress AI chatbot plugins on the market take the same route: break the entire site into pieces, turn them into vectors stored in a database, and at question time pull back a few segments by similarity to hand to the model. This route works, and similarity does find relevant passages. But the vector database does one redundant thing from the very start: it takes what you had already organized, breaks it apart, and does the work over.

Vector retrieval is good at computing relevance out of chaos. But when a site already has clear titles, summaries, and topic groupings, tearing it into fragments, compressing it into vectors, and reassembling by similarity is taking the long way around to redo editorial work that was directly usable. RAG Sitemap does not break the content apart. It walks along the shelves you arranged and takes the book directly. You are not handing your site to the AI and hoping for the best; you are organizing it into a knowledge map designed for machine understanding, so the AI understands first and answers second.

What AI Needs Is a Catalog, Not Another Fog

Like humans, an AI finds it easier to judge whether an article is worth reading from progressive-disclosure titles, descriptions, and hierarchy paths than from full text. Even if an LLM can hold a million tokens in a single task, giving it the smallest, most decisive information first and reading deeper only as needed still spares the model detours and noise. RAG Sitemap translates a site's existing categories, tags, hierarchy, and summaries directly into a plain-text map an AI can read, so the model understands the catalog and the path first instead of falling into the fog of full text.

With no similarity computation, retrieval becomes a multiple-choice question: the model reads the catalog, judges which category the question belongs to and which path to take, then walks to the article it should read. It is not computing which segment is most alike; it is making a choice after understanding the catalog. The categorization and descriptions you already produce while running your site are its index. There is no need to worry about the vector-database routine of how large to cut each chunk, which embedding model to pick, or what similarity threshold to set.

WordPress Entropy Has Already Been Lowered, Step by Step, While You Run the Site

To an LLM, a piece of unorganized plain text has extremely high entropy. Every time WordPress publishes an article, the author picks a category, writes a title, writes a summary, and publishes to a hierarchical URL. These four SEO actions are four small reductions in entropy. A WordPress site running for five years is these four actions repeated thousands of times, leaving behind a low-entropy corpus that has already been organized.

The vector database is built on the logic of "I face high-entropy unstructured text, so I must recompute order with embeddings." RAG Sitemap is built on the logic of "the entropy of this corpus was lowered long ago for SEO, so I only need to reveal the already-categorized text and descriptions, not recompute them." For the same content, the vector database chooses to recompute; RAG Sitemap chooses to reveal.

Related articles

一個渺小、穿著樸素希臘長袍的人形機器人，手提一盞小燈，在一座龐大宏偉的石柱長廊中自信地往深處走去，廊柱朝遠方無止盡延伸。渺小的機器人是 Llama 3B 這種垃圾級小模型，手中的小燈只照亮自己腳下，是它有限的世界知識。但牡步伐自信，因為真正在引路的是周圍的石柱秩序，不是手裡的燈。柱列朝深處延伸，對應 master → category → post 的漸進式披露。模型小不要緊，秩序夠清楚的時候，每一 hop 都收斂成一道選擇題。這幅畫的主角不是機器人，是廊柱本身，能力強弱不是關鍵，結構正確才是。

A Small Language Model (SLM) Actually Understood an Entire Website

The small model Llama 3.2 3B is a language model with only 3B parameters, about as small as they come. Ask it a question and it can only answer from its 3B of training data. It does not know what your site says, does not know which articles you published, and knows nothing about the content you have built up recently. Using it to run website Q&A should have been a fantasy.

A small figure holds only a single flat paper map, trapped inside a huge multi-level labyrinth of stone arches and staircases, endless stairs extending up, down, and in every direction. The flat paper map is the essence of llms.txt: a catalog with only titles and no body, meant for flat reading. The three-dimensional maze is the knowledge structure of a real site, with floors, circulation routes, and interconnected depth. The figure is tiny, and the mismatch in scale between tool and environment is exactly the situation of an AI holding a 2D list against 3D content. Light enters from one side, but without the right map, light cannot replace structure. What RAG Sitemap sets out to solve is making this flat plan three-dimensional, walking down the three layers of master, category, and post.

The Good-Faith Limits of llms.txt

llms.txt is a sitemap designed for AI to read, but its limit is that it has only one layer, which is not enough for an organized, structured site. Its standard format is the site name as an H1, a short summary, and below that each line is a title: description pointing to one page. But it cannot tell whether a line is a category page, a standalone page, an article, or a product page; every line is treated as the same kind of thing. The intention is not wrong, and the goal is to make a site easier for AI to read. The problem is not the description but that it flattens the site into a single layer, destroying the site's original narrative power and content context.

A group of people circle a central light, each receiving and cupping a flame of their own, light spreading from one place into many separate palms. The central light is the cloud API of the past decade, where every inference had to come back and pay the bill. The light passed to each pair of hands corresponds to the trajectory of the NPU, chip is model, and the Chrome Prompt API, with inference moved back onto the visitor's own device. Each flame is close in size, meaning the edge small model is already capable enough to carry a site's navigation task. The posture of hands cupping a flame is privacy and non-disclosure; privacy holds naturally under this architecture. The distances between people are even: this is not a new center replacing the old one but the center dissolving entirely.

The End Goal: Moving Compute onto the User's Device

"AI-on-Chip" means that when every device has a small AI model carved into a chip, the model is no longer software that must be loaded but a compute chip always on standby. The LLM inference an application needs can run locally on the visitor's device, bringing the site owner's AI compute cost to zero. This is the end goal of RAG Chatbot.