Beyond the Prompt: Why Your RAG System May Be Underperforming

This is Part 1 of a three-part series:

AI & Machine Learning

Data & AI Strategy

AI Agents & Chatbots

Forecasting & Prediction

Faced with the question “What is the capital of the Netherlands?” you have a few possible responses:

1

Answer confidently
If you know it

2

Look it up
If uncertain

3

Take a guess
Might be wrong

Large Language Models (LLMs) face the same challenge. They excel when a question falls inside their training data, but when it doesn’t, they may “hallucinate,” producing an answer that sounds plausible but is wrong.

The key difference is that LLMs don’t have direct access to your enterprise data or knowledge bases without additional retrieval methods. That’s where Retrieval-Augmented Generation (RAG) comes in.

RAG in a Nutshell

RAG is the process of giving an LLM access to relevant, external information so it can answer queries more accurately. The typical RAG workflow looks like this:

The value of RAG is that it allows models of any size to deliver high-quality, context-aware answers, whether it’s the latest company policy, current product details, or niche industry knowledge. But RAG doesn’t operate in isolation. For RAG to deliver consistently, it needs to be part of a well-designed information environment, also known as context engineering.

The Shift from Prompt to Context Engineering

In the early days, “prompt engineering” was the art of crafting the right wording to get the right answer. But as AI systems have grown more complex, the industry has realized that context quality of context matters more than the cleverness of the prompt.

Context engineering builds the full information environment around the LLM, not just the immediate instruction, but also system settings, past conversation history, retrieved documents, tools, and output formats.

Prompt Engineering
Shaping single-turn prompts for answers

Context Engineering
Shaping context for multi-step tasks

RAG is a critical part of context engineering, ensuring that the model’s “world” includes the exact information needed for the task.

It’s Not Your RAG, It’s Your Context

In real-world deployments, many RAG systems disappoint, and the issue is almost never the model. It’s bad context engineering. Common pitfalls include:

Irrelevant retrieval: Pulling the wrong documents wastes tokens and distracts the model. 
Excessive retrieval: Overloading the context window with too much data.
Token limits and truncation: Cutting off content can cause the model to miss critical context.
Incomplete context: Missing critical information like user profiles or prior steps.

Imagine an AI system reviewing legal contracts that confidently reports a key clause is missing. In reality, the clause exists, but the retrieval process never pulled it into the model’s context. This kind of gap shows why careful retrieval design is essential.

Engineering Retrieval for Success

Preventing these failures starts with designing retrieval around the business use case:

Score for relevance: Don’t just match keywords; ensure retrieved content truly answers the question. 
Chunk intelligently: Break documents into logical, searchable segments. 
Compress when needed: Summarize or strip redundancy to avoid token waste. 
Preserve essentials: Keep high-priority context like instructions and user state intact.

Done well, RAG produces grounded, fresh, scalable, and personalized AI outputs. But in many real-world environments, not all the information you need is text. From images and videos to audio clips and charts, handling different content formats introduces new retrieval challenges — and that’s where multi-modal context comes in.

Handling Multi-Modal Context

Most embedding models are optimized for a single type of data, and text models usually outperform others. Multi-modal embeddings (for example, image plus text models) often underdeliver in production.

A surprisingly effective solution is to convert all content to text before retrieval.

For example:

By indexing text representations, retrieval accuracy for non-text content improves dramatically.

RAG in the Real World

OneSix built an AI-powered chatbot for a higher education client to help students get answers faster.

Our Work in Action

Elevating the student experience with AI-powered search

Read Story

By applying RAG, the chatbot summarized thousands of unstructured documents, giving students accurate answers instantly and helping the university better serve its community.

Real-world RAG success comes from context engineering, feeding models the right information to deliver accurate, reliable, business-ready answers.

Ready to unlock the full potential of RAG?

At OneSix, we design and deploy Retrieval-Augmented Generation systems built for the real-world. We engineer context, optimize retrieval, and integrate AI into your workflows—so your models deliver accurate, reliable, measurable results.

Let’s talk about how we can turn your AI ideas into measurable results.

Co-written by

Matt Altberg, Lead ML Engineer
Francisco Gonzalez, Sr. Architect

Published

August 19, 2025

About Us

Meet the Team

Careers

Beyond the Prompt: Why Your RAG System May Be Underperforming

Beyond the Prompt: Why Your RAG System May Be Underperforming

This is Part 1 of a three-part series:

1

2

3

RAG in a Nutshell

The Shift from Prompt to Context Engineering

It’s Not Your RAG, It’s Your Context

Engineering Retrieval for Success

Handling Multi-Modal Context

RAG in the Real World

Ready to unlock the full potential of RAG?

Co-written by

Published

Pages

Archives

Categories

Expertise

Company

Resources