Beyond the Prompt: Why Your RAG System May Be Underperforming

Beyond the Prompt: Why Your RAG System May Be Underperforming

AI & Machine Learning
Data & AI Strategy
AI Agents & Chatbots
Forecasting & Prediction

Faced with the question “What is the capital of the Netherlands?” you have a few possible responses:

1


Answer confidently
If you know it

2


Look it up
If uncertain

3


Take a guess
Might be wrong


Large Language Models (LLMs) face the same challenge. They excel when a question falls inside their training data, but when it doesn’t, they may “hallucinate,” producing an answer that sounds plausible but is wrong. 

The key difference is that LLMs don’t have direct access to your enterprise data or knowledge bases without additional retrieval methods. That’s where Retrieval-Augmented Generation (RAG) comes in.

RAG in a Nutshell

RAG is the process of giving an LLM access to relevant, external information so it can answer queries more accurately. The typical RAG workflow looks like this:

The value of RAG is that it allows models of any size to deliver high-quality, context-aware answers, whether it’s the latest company policy, current product details, or niche industry knowledge. But RAG doesn’t operate in isolation. For RAG to deliver consistently, it needs to be part of a well-designed information environment, also known as context engineering.

 

The Shift from Prompt to Context Engineering

In the early days, “prompt engineering” was the art of crafting the right wording to get the right answer. But as AI systems have grown more complex, the industry has realized that context quality of context matters more than the cleverness of the prompt.

Context engineering builds the full information environment around the LLM, not just the immediate instruction, but also system settings, past conversation history, retrieved documents, tools, and output formats.


Prompt Engineering
Shaping single-turn prompts for answers


Context Engineering
Shaping context for multi-step tasks


RAG is a critical part of context engineering, ensuring that the model’s “world” includes the exact information needed for the task.

It’s Not Your RAG, It’s Your Context

In real-world deployments, many RAG systems disappoint, and the issue is almost never the model. It’s bad context engineering. Common pitfalls include:

Imagine an AI system reviewing legal contracts that confidently reports a key clause is missing. In reality, the clause exists, but the retrieval process never pulled it into the model’s context. This kind of gap shows why careful retrieval design is essential.

Engineering Retrieval for Success

Preventing these failures starts with designing retrieval around the business use case:

Done well, RAG produces grounded, fresh, scalable, and personalized AI outputs. But in many real-world environments, not all the information you need is text. From images and videos to audio clips and charts, handling different content formats introduces new retrieval challenges — and that’s where multi-modal context comes in.

Handling Multi-Modal Context

Most embedding models are optimized for a single type of data, and text models usually outperform others. Multi-modal embeddings (for example, image plus text models) often underdeliver in production.

A surprisingly effective solution is to convert all content to text before retrieval.

For example:

By indexing text representations, retrieval accuracy for non-text content improves dramatically.

RAG in the Real World

OneSix built an AI-powered chatbot for a higher education client to help students get answers faster.


By applying RAG, the chatbot summarized thousands of unstructured documents, giving students accurate answers instantly and helping the university better serve its community.

Real-world RAG success comes from context engineering, feeding models the right information to deliver accurate, reliable, business-ready answers.

Ready to unlock the full potential of RAG?

At OneSix, we design and deploy Retrieval-Augmented Generation systems built for the real-world. We engineer context, optimize retrieval, and integrate AI into your workflows—so your models deliver accurate, reliable, measurable results.


Let’s talk about how we can turn your AI ideas into measurable results.

Contact Us
Co-written by

Matt Altberg, Lead ML Engineer
Francisco Gonzalez, Sr. Architect

Published

August 19, 2025