RAG Pipeline
Last updated: 2026-01-30
This document details the Retrieval-Augmented Generation (RAG) pipeline, focusing on how and why it is integrated into the agent’s decision-making process.
What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by giving them access to an external knowledge base. Instead of relying solely on its training data, a RAG system first retrieves relevant information and then uses this information to generate a more accurate, evidence-based, and contextually relevant response.
RAG in the Monstermessenger Agent
In this project, RAG is used to ensure the agent’s advice is grounded in specific, trusted strategies for dealing with cyberviolence. It allows the agent to provide more than just generic support by referencing concrete information from our curated knowledge base.
The entire RAG workflow is orchestrated between the give_advice and research_strategies nodes in the agent graph.
1. Triggering the RAG Query
The RAG process is initiated within the give_advice node under specific conditions:
- First Advice: When the agent is about to give its first piece of advice to the user after collecting context, it forces a RAG query to ensure the initial response is well-informed.
- Explicit Need: The
give_advicenode can also decide to trigger a RAG query if it determines more information is needed to answer a user’s follow-up question.
When a RAG query is triggered, the give_advice node formulates a research_query based on the user’s situation and passes it to the research_strategies node via the agent’s state.
2. Executing the RAG Query
The research_strategies node acts as the executor of the RAG process:
- It receives the
research_queryfrom the state. - It calls the
RAGServiceto perform the actual search against the vector store. - The
RAGServiceretrieves relevant text chunks from the knowledge base. - The
research_strategiesnode then prompts the LLM with these retrieved chunks and the original query, asking it to synthesize the information into a conciseresearch_result.
3. Using the Retrieved Context
This synthesized research_result is not sent directly to the user. Instead, it is passed back to the give_advice node.
- The
give_advicenode runs for a second time, now aware that research results are ready. - It enriches the main system prompt, instructing the LLM to use the provided research results in its final answer.
- The
research_resultis appended to the conversation history. - The LLM generates the final, user-facing advice, which is now grounded in the information retrieved from the knowledge base.
This two-step process (Query -> Synthesize -> Advise) ensures that the RAG output is seamlessly integrated into the agent’s conversational flow and empathetic tone, rather than just being presented as a raw data dump.
The Retrieval Process (High-Level)
RAGService: The retrieval logic is encapsulated in theapi/services/rag.py. This service handles communication with the vector store.- Variant-Specific Knowledge: The service is variant-aware. It automatically queries the correct knowledge base (
docs_youthordocs_adult) based on theCHATBOT_VARIANTenvironment variable, ensuring the retrieved information is appropriate for the target audience.
Note on the Knowledge Base
The RAG system is fed by a knowledge base of .pdf and .docx documents. The process for indexing these documents (converting them into a searchable format) is currently under review and will be detailed in a future version of this documentation.