Data Layer

Last updated: 2026-01-30

Overview

The project uses a single PostgreSQL database to manage all persistent data, but this data is divided into three distinct logical domains:

  1. Application Data: Custom tables for storing user feedback and conversation analytics. The schema is managed by our application using SQLAlchemy.
  2. LangGraph State: The conversation history and agent state. This is managed automatically by the langgraph library’s PostgresStore.
  3. RAG Vectors: The vector embeddings for the RAG knowledge base. This is managed by the PGVector library.

This approach allows for a unified database backend while keeping the different data concerns cleanly separated.

Application Data Schema (api/models.py)

The schema for our core application data is defined using SQLAlchemy ORM models in api/models.py. The init_db() function, called on application startup, ensures that these tables are created.

The key models are:

Feedback

This table stores user feedback submitted through the UI.

  • It includes fields for a rating value, a comment, whether the feedback was positive (is_positive), and the type of incident being reported.
  • Decoupling: Crucially, the session_id in this table is a UUID that corresponds to the thread_id from the LangGraph store, but there is no hard database foreign key constraint. This decouples the application data from the LangGraph persistence layer, allowing them to evolve independently.

ConversationAnalytics

This is the primary table for storing the results of our analytics pipeline (see LLMOps & Observability for details).

  • It stores the high-level extracted data, such as the initial bullying message, a summary of the conversation, the user’s mood, and the strategy suggested by the bot.
  • It also stores metadata like the chatbot_version and language.

Categorization Tables

To allow for more detailed analysis, the ConversationAnalytics table is linked to several child tables that store categorized data:

  • ConversationMessagesCategorized
  • ConversationMoodsCategorized
  • ConversationStrategiesCategorized
  • ConversationAnswerTypesCategorized

These tables have a many-to-one relationship with ConversationAnalytics, allowing a single conversation to be associated with multiple categories (e.g., a message can be both an “insult” and a “threat”). This normalized structure is much more powerful for querying than storing categories in a simple list.

Database Connection (api/database.py)

The connection to the PostgreSQL database is managed in api/database.py.

  • An SQLAlchemy engine is created using the database URL from the application settings. It is configured with a connection pool for efficient access in a concurrent environment.
  • A SessionLocal factory is created for producing new database sessions.
  • A FastAPI dependency, get_db(), is defined. Endpoints that need to interact with the database include this dependency, which provides them with a session and ensures it is properly closed after the request is complete.

LangGraph In-Memory State

The full state of an active conversation—the message history, current action, context data, etc.—is managed by LangGraph’s MemorySaver.

  • In-Memory Only: This means the complete, turn-by-turn state of a conversation is held in-memory on the server.
  • Session Lifetime: This detailed state persists only for the duration of a user’s session on a live server instance. If the server process restarts, the full conversation history is lost.
  • Performance: This trade-off is made for performance, as it avoids writing to the database on every single turn of the conversation.

Conversation Summary Persistence (PostgresStore)

While the full conversation history is in-memory, long-term memory for returning users is achieved by persisting conversation summaries to the database.

  • PostgresStore as a Key-Value Store: A langgraph.store.postgres.PostgresStore instance is connected to our PostgreSQL database, but it is used as a simple key-value store, not as the primary checkpointer for the graph state.
  • Exclusive Role: Its only role in the current implementation is to persist the conversation summaries generated by the summarize_conversation node. The store.put(("conversation_summaries",), user_id, ...) method is used to save the summary.
  • This summary is then retrieved via store.get() for returning users to give the agent high-level context of past interactions, even if the server has restarted.

PGVector for RAG

Finally, the vector embeddings for the RAG knowledge base are also stored in the same PostgreSQL database.

  • The langchain_postgres.PGVector library manages its own tables within the database to store the document chunks and their embeddings.
  • It creates separate collections for each variant (e.g., docs_youth, docs_adult), keeping the knowledge bases isolated.