// the groundwork

Data as the groundwork

Most AI projects do not fail on the technology. They fail because tools get installed with no groundwork, and the agents end up reading from a mess. This is the unglamorous work that decides whether any of it holds: getting your knowledge into a shape an agent can actually use.

An agent is only as good as what it is allowed to read. You can buy the best model on the market, wire it into your helpdesk, and watch it confidently invent a refund policy you never had, because nobody gave it the real one in a form it could find. The model was never the weak link. The groundwork was.

Groundwork is not a big-data project, and it is not a year of cleanup before you are allowed to start. It is two decisions made well: what your agents need to read, and where that knowledge should live so they can reach it without guessing. Get those right and a small model goes far. Get them wrong and no model saves you.

Route knowledge to the right store

The first decision is routing. Different kinds of knowledge are queried in different ways, and each kind wants a different home. Lump them together and every answer gets worse.

Your knowledge Prose → Vector DB policies, FAQs, guides (RAG) Records → SQL store orders, inventory, contracts Assets → Key-value templates, snippets, files
Three kinds of knowledge, three kinds of store. Match the store to the question you will ask.

Prose you search by meaning belongs in a vector database. Policies, help articles, how-to guides: the things where the answer is "find me the passage that covers this". This is what retrieval-augmented generation, or RAG, is for, and it is the only bucket most people have heard of.

Records you query by field belong in a SQL database. Orders, inventory, contracts, anything where the question is "what is the status of order 4471" or "how many units left". You want an exact answer from a table, not the closest-sounding paragraph.

Assets you fetch by name belong in plain key-value or file storage. Email templates, contract snippets, brand assets: things you retrieve whole, by their name, not by searching their contents.

The classic mistake

Putting everything in a vector database because it is the part people read about. Your order records do not belong in a search index, they belong in a table you can query exactly. When a customer asks about order 4471, "the most similar-sounding order" is a wrong answer dressed as a smart one.

You probably need less than you think

The second decision is how much machinery you actually need, and the honest answer is usually "less". The right setup is set by two things: how big your knowledge is, and how often it changes. Most teams reach for a vector database on day one when their entire knowledge base would fit in a single prompt.

Stage 1 A handful of documents Load them into context. No database. Stage 2 A real wiki, still fits the window Organise it. Still no vector DB. Stage 3 Thousands of docs, or changing fast Now a vector DB and RAG earn their place. Stage 4 Prose, records, and assets at scale A full hybrid setup, several stores.
Where knowledge should live as it grows. Most teams sit at Stage 1 or 2 and do not know it.

Climb this ladder only when the work pushes you up it, never because a tool vendor said you should. A vector database has a real cost in setup, in maintenance, and in the new ways it can quietly return the wrong passage. If your knowledge fits in the model's context window, putting it there is faster, cheaper, and easier to trust. The skill is knowing which stage you are actually at.

The model is rarely the problem. What you let it read, and where that lives, almost always is.

Build from where you already are

None of this means a rip-and-replace. The best groundwork anchors to the tools you already run. Your prose probably already lives in Notion or Google Drive. Your records already live in a database, a spreadsheet, or your CRM. The job is to organise what is there and connect it, not to migrate your whole company onto something new before an agent can read a single file.

The groundwork you can do for free, today

Here is the part nobody sells you, because there is nothing to sell. The highest-leverage groundwork costs nothing and needs no tools. You can start it this week, before you buy anything, and it is the same work the assessment will tell you to do first.

3
kinds of store, not one: prose, records, assets
Stage 1
where most teams actually start: no database needed
0 EUR
the groundwork you can do before buying any tool

This is also why so much AI spending disappoints. When the MIT NANDA study found that 95 percent of enterprise AI pilots showed no measurable P&L impact in 2025, the pattern underneath was familiar: tools dropped on top of knowledge that was never made legible to them. The groundwork is boring, it is cheap, and it is the difference between an agent that helps and one that hallucinates.

Find out how to structure your data first.

The free assessment shows what your agents will need to read and where it should live, before you spend on anything. About ten minutes, no card required, and the full report is 450 EUR only if you go further.

Start assessment
Keep reading
The four AI modes The 5-step framework Eight mistakes to avoid