RAG Pipeline

Last updated: 2026-01-30

This document details the Retrieval-Augmented Generation (RAG) pipeline, focusing on how and why it is integrated into the agent’s decision-making process.

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by giving them access to an external knowledge base. Instead of relying solely on its training data, a RAG system first retrieves relevant information and then uses this information to generate a more accurate, evidence-based, and contextually relevant response.

RAG in the Monstermessenger Agent

In this project, RAG is used to ensure the agent’s advice is grounded in specific, trusted strategies for dealing with cyberviolence. It allows the agent to provide more than just generic support by referencing concrete information from our curated knowledge base.

The entire RAG workflow is orchestrated between the give_advice and research_strategies nodes in the agent graph.

1. Triggering the RAG Query

The RAG process is initiated within the give_advice node under specific conditions:

First Advice: When the agent is about to give its first piece of advice to the user after collecting context, it forces a RAG query to ensure the initial response is well-informed.
Explicit Need: The give_advice node can also decide to trigger a RAG query if it determines more information is needed to answer a user’s follow-up question.

When a RAG query is triggered, the give_advice node formulates a research_query based on the user’s situation and passes it to the research_strategies node via the agent’s state.

2. Executing the RAG Query

The research_strategies node acts as the executor of the RAG process:

It receives the research_query from the state.
It calls the RAGService to perform the actual search against the vector store.
The RAGService retrieves relevant text chunks from the knowledge base.
The research_strategies node then prompts the LLM with these retrieved chunks and the original query, asking it to synthesize the information into a concise research_result.

3. Using the Retrieved Context

This synthesized research_result is not sent directly to the user. Instead, it is passed back to the give_advice node.

The give_advice node runs for a second time, now aware that research results are ready.
It enriches the main system prompt, instructing the LLM to use the provided research results in its final answer.
The research_result is appended to the conversation history.
The LLM generates the final, user-facing advice, which is now grounded in the information retrieved from the knowledge base.

This two-step process (Query -> Synthesize -> Advise) ensures that the RAG output is seamlessly integrated into the agent’s conversational flow and empathetic tone, rather than just being presented as a raw data dump.

The Retrieval Process (High-Level)

RAGService: The retrieval logic is encapsulated in the api/services/rag.py. This service handles communication with the vector store.
Variant-Specific Knowledge: The service is variant-aware. It automatically queries the correct knowledge base (docs_youth or docs_adult) based on the CHATBOT_VARIANT environment variable, ensuring the retrieved information is appropriate for the target audience.

Note on the Knowledge Base

The RAG system is fed by a knowledge base of .pdf and .docx documents. The process for indexing these documents (converting them into a searchable format) is currently under review and will be detailed in a future version of this documentation.