The Café's AI Barista | An AI Rep That Recommends Beans and Talks Cupping

AI Barista

The Café's AI Barista | An AI Rep That Recommends Beans and Talks Cupping

The AI barista on a café's site is for the customers who want to understand bean selection and brewing a little more but never asked. What to eat, what to drink, which flavor: these are the small daily decisions people get stuck on most easily. A coffee lover reads the whole bean list, the origin, altitude, processing, and flavor, and often still is not sure which one to pick, yet feels awkward walking up to the bar to ask you during the rush.

A café needs an AI barista with taste, one that recommends beans based on the customer's mood

From the first message, this AI barista knows every bean's origin and flavor, and knows why you chose these blends. For you, busy with a pour-over, it tells the people who want to understand the care behind your bean selection and roasting; for a visitor who wants to know your coffee better, talking to an AI carries none of that psychological hurdle, so they can say how they feel right now and the AI barista picks the one that fits best. It does not pour latte art or ring up the bill for you; it is an extra seat at the bar that your site gains: whatever a visitor wants to know in the moment, it searches the RAG Sitemap for the bean list and cupping notes you wrote, analyzes them together with the visitor's current mood and message, and then recommends.

The barista's taste comes from the bean selection and notes you already care about

Your café's site has written a lot: the altitude of this single origin's region, the ratio in that blend, why only one sack of a certain season's Guatemala came in. The bean-list page has origin, processing, and roast level; the blog has cupping notes and reasons for the selection; the FAQ has take-home-bean rules and how to order; the events page has the month's new single origins. Every cup has a story, but when a visitor opens the site, they mostly see the bean list and the prices, and the story stops there. The people who read most carefully are usually the ones who want to understand most deeply yet never ask; a question forms in their mind and they just keep reading, until the question is left unspoken and they quietly pass you by.

The reason for choosing each bean, the roasting judgment, the memory of a season's flavor, and the pairing logic behind a blend are the content a coffee lover most needs to know and that most helps them pick the right bean, yet it is hard to ask every barista to memorize it fully. An AI rep will certainly remember this more completely than an ordinary barista: its memory is your site itself, and it can find every bean note and every reason for a selection you have written.

One question, and the bean's story falls into place

Someone flipping through the bean list asks, "This Ethiopia, the 'floral' you describe, which flower is it close to?" It picks a more concrete description from the cupping notes you wrote. Someone about to buy beans asks, "I use a moka pot at home; which would you recommend?" Based on the brewing suggestions you wrote, it gives a recommendation.

Someone curious about your selection asks, "Why did you choose this blend this season?" From the selection notes you wrote, it tells them the season and the considerations. Someone asks about shop details, "Do you have take-home beans? How much per bag?" From your bean list and take-home rules, it says it all at once.

By the time a visitor sees these few exchanges, your site has quietly risen to another level. Where a visitor opening the bean list once saw only names and prices, now one sentence brings out the bean's story; where a customer who wanted to know more about your selection but never asked could only buy the few they already knew, now they can ask freely before deciding which to buy; and where so many cupping notes you wrote went unread, now they are called up to speak, one cup at a time.

Can you afford this character?

AI token cost can start from $0. Cloud AI providers commonly offer free tiers, and those tiers are counted per model: each model comes with its own separate quota. RAG Chatbot reserves one routing slot each for retrieval, response, and vision, and you can assign a different model to each slot, so multiple quotas work for you in parallel. Some providers offer only text models; in that case vision is handed to another provider that has an image model. Multi-model routing is the headroom this architecture builds in for you.

Cerebras's free tier gives a single text model 1M tokens a day; with one model applied to both retrieval and response, that is roughly 45 questions answered per day. For a small site, this quota lands right on the sweet spot of long-term sustainability.

Even if traffic one day grows beyond the free tier, switching to an entry-level paid API costs about the price of a soda each month. The model we used while developing the plugin is Llama-3.2-3B-Instruct, a step smaller than Llama-3-8B-Instruct, the cheapest paid model commonly offered by cloud API providers, so any paid AI model on the market only puts you on firmer ground.

RAG Chatbot itself adds no markup on any token.

Model

Llama 3 8B Instruct

Provider

OpenRouter / Groq

Input Price

$0.05 / 1M tokens

Output Price

$0.08 / 1M tokens

Tokens per Q&A

≈ 20k input + 2k output

Per Answer

≈ $0.0012

Monthly cost at 1,000 questions

≈ $1.2