Onboarding & Contributor Tutorials

Last updated: 2026-04-08

This page is the starting point for new contributors. Work through the checklist below in order, then use the task-specific tutorials to make your first meaningful change.


New Contributor Checklist


Tutorial 1 — Run the Test Suite

Before making any change, verify that the existing tests pass.

cd api
pytest

To run a focused subset (faster):

pytest tests/i18n/ tests/agents/ -v

Expected output: all tests pass. If any fail, check the Testing Guide for the fixture setup requirements (SQLite in-memory DB, GOOGLE_API_KEY dummy value).


Tutorial 2 — Add or Update a Translation

All user-facing strings live in api/i18n/ as YAML files. No Python changes are needed.

2a — Add a context question

Context questions are in api/i18n/context_questions/<LANG>.yaml. Each file has a teenager and a parent section.

Example — adding a new question to FR.yaml:

# api/i18n/context_questions/FR.yaml
teenager:
  # … existing questions …
  platform:                          # new question key
    question: Sur quelle plateforme cela s'est-il passé ?
    dropdown_options:
      - value: instagram
        label: 📸 Instagram
      - value: tiktok
        label: 🎵 TikTok
      - value: other
        label: ❓ Autre
parent:
  platform:
    question: Sur quelle plateforme l'incident a-t-il eu lieu ?
    dropdown_options:
      - value: instagram
        label: 📸 Instagram
      - value: tiktok
        label: 🎵 TikTok
      - value: other
        label: ❓ Autre

Repeat the same key in every language file (EN.yaml, ES.yaml, IT.yaml, DE.yaml, PT.yaml). The i18n manager will fall back to French if a translation is missing, but a complete set is expected.

Verify with:

cd api && pytest tests/i18n/ -v

2b — Update a system prompt

System prompts live in api/i18n/prompts/<LANG>/ as YAML files named after the node (e.g., give_advice.yaml). Edit the relevant file directly — no Python restart is needed in development because prompts are reloaded from disk on each invocation (the two-layer cache is warm but the underlying files are always consulted for cold starts after a restart).

2c — Add a new language

  1. Create api/i18n/context_questions/XX.yaml (copy FR.yaml as a template).
  2. Create api/i18n/ui_messages/XX.yaml.
  3. Create api/i18n/prompts/XX/ and add one YAML file per node (copy and translate the FR versions).
  4. Create api/i18n/prompt_snippets/XX.yaml.
  5. Create api/i18n/tools/XX.yaml.
  6. Add "XX" to I18nSettings.supported_languages in api/config.py.
  7. Add the language to frontend/src/utils/i18n.ts.
  8. Run pytest tests/i18n/ — the placeholder and completeness tests will catch missing keys.

The api/i18n/autotranslate/ tooling can produce a first draft automatically from French using the Gemini translation model.


Tutorial 3 — Node Prompt Management

This tutorial covers the full lifecycle of a node’s prompt: understanding the manifest format, modifying an existing prompt, adding new injected content, and creating a brand-new node.

How NodeEnv compiles a prompt

When a node is invoked, it calls NodeEnv.compile(node_id, state). The compiler:

  1. Loads nodes/manifests/<node_id>.yaml.
  2. Resolves the user’s language and user_type from state.
  3. Iterates the system_prompt.parts list. Each entry is either:
    • A string (e.g., "identity", "role") — resolved from api/i18n/prompts/<LANG>/<node>.yaml.
    • A dict — a dynamic part that is only included if its condition evaluates to True against state, and whose content is either injected from state or loaded from a prompt snippet.
  4. Appends any active tools from the tools.catalog, filtered by their own optional condition.
  5. Wraps every piece in XML-style tags (<identity>…</identity>) for structural clarity.
  6. Binds the resulting SystemMessage to the LLM, attaching tools (bind_tools) or a structured output schema (with_structured_output).

Manifest format reference

# yaml-language-server: $schema=./node_manifest.schema.json
name: "my_node"                        # must match the node's key in graph.py

system_prompt:
  parts:
    # ── Static parts ──────────────────────────────────────────────────────
    # Each string is a key that must exist in api/i18n/prompts/<LANG>/my_node.yaml
    - identity
    - role

    # ── Dynamic injected parts ────────────────────────────────────────────
    - id: situational_context          # tag name in the compiled prompt
      condition: "state.get('context_complete', False)"
      inject: "state.context_data"    # dotted path into state
      formatter: dump_situation_context  # function name in Registry.get_formatter()

    # ── Snippet parts (load text from i18n prompt_snippets) ───────────────
    - id: language_instruction
      snippet: language_setting_prompt
      format:                          # evaluated against state at runtime
        language: "state.get('language', 'FR')"

    # ── Conditional snippet ───────────────────────────────────────────────
    - id: first_advice_hint
      condition: "state.get('first_advice', False)"
      snippet: "first_advice_prompt_extension"

tools:
  preamble:
    snippet: tool_preamble             # loaded from i18n/prompt_snippets/<LANG>.yaml
  catalog:
    - research_educational_strategies  # always active (no condition)
    - tool: lookup_contacts_by_country # conditional
      condition: "state.get('first_advice', False)"

answer_schema: MyNodeAnswer            # Pydantic class name registered in Registry

Validate the manifest

The schema file nodes/manifests/node_manifest.schema.json lets your IDE flag invalid keys immediately. After adding a new node name to the schema you can regenerate it:

cd api
python agents/service1/nodes/manifests/refresh_manifest_schema.py

Modifying an existing node’s prompt — step by step

Goal: add an extra reminder to give_advice only when the conversation is already in ongoing-support mode.

  1. Add a snippet key to every language’s prompt_snippets/<LANG>.yaml:
# api/i18n/prompt_snippets/FR.yaml (and EN/ES/IT/DE/PT)
ongoing_support_reminder: |
  Tu es en mode de soutien continu. Sois particulièrement attentif(ve) à
  la cohérence avec le résumé de la conversation précédente.
  1. Add the conditional part to nodes/manifests/give_advice.yaml:
system_prompt:
  parts:
    # … existing parts …
    - id: ongoing_support_reminder
      condition: "state.get('ongoing_support_mode', False)"
      snippet: "ongoing_support_reminder"
  1. Restart the backend (cold start re-reads the manifest). No Python changes required.

  2. Test by sending a message while ongoing_support_mode is True and checking Langfuse for the new <ongoing_support_reminder> tag in the trace.

Adding a new node — step by step

  1. Create the node function in api/agents/service1/nodes/my_node.py:
import logging
from agents.service1.core.state import Service1State
from agents.service1.utils.node_env import NodeEnv

logger = logging.getLogger("chatbot.agent")

async def my_node(state: Service1State, store=None) -> dict:
    prompt, llm = NodeEnv.compile("my_node", state)
    response = await llm.ainvoke([prompt] + state["messages"])
    return {"action": response.action, "messages": [response]}
  1. Create the manifest nodes/manifests/my_node.yaml (see format reference above). Validate it against the schema.

  2. Register any new response schema in api/agents/service1/core/registry.py:

from agents.service1.nodes.schemas import MyNodeAnswer
Registry.register_schema("MyNodeAnswer", MyNodeAnswer)
  1. Wire the node into the graph (graph.py):
from agents.service1.nodes.my_node import my_node
graph.add_node("my_node", my_node)
graph.add_edge("agent1", "my_node")   # or use add_conditional_edges
  1. Add i18n prompts for the new node in api/i18n/prompts/<LANG>/my_node.yaml for every supported language.

  2. Write a sociable test in tests/agents/test_nodes_injection.py (see Testing Guide).


Tutorial 4 — Extending the Database Schema

This tutorial walks through adding a new column to an existing table and creating an entirely new table, both with Alembic migrations.

How the schema is organised

All ORM models for the app schema live in api/database/models/__init__.py. The Base class attaches all these models to the app schema:

class Base(DeclarativeBase):
    metadata = MetaData(schema=os.getenv("DB_SCHEMA") or Schema.APP.value)

Enum types used in column definitions are in api/database/enums.py. The Alembic env.py uses a monkeypatched Enum.__init__ that forces inherit_schema=True on every enum, which causes PostgreSQL to create the ENUM type in the schema of each table that references it (rather than always in public).

Adding a column to an existing table

Goal: add a preferred_monster column to the User model.

  1. Update the ORM model in api/database/models/__init__.py:
class User(Base):
    # … existing columns …
    preferred_monster = Column(String(100), nullable=True)
  1. Generate the migration using Alembic’s autogenerate:
cd api
DATABASE_URL="postgresql+psycopg://user:pass@host/db" \
  alembic revision --autogenerate -m "add preferred_monster to users"
  1. Review the generated file in api/migrations/versions/. Autogenerate is a good starting point but should always be inspected — check that the schema='app' argument is present on every DDL operation:
def upgrade() -> None:
    op.add_column(
        'users',
        sa.Column('preferred_monster', sa.String(length=100), nullable=True),
        schema='app'           # ← required for multi-schema setups
    )

def downgrade() -> None:
    op.drop_column('users', 'preferred_monster', schema='app')
  1. Apply the migration:
alembic upgrade head
  1. Update any related Pydantic schemas in api/schemas/app_data.py if the new column should appear in API responses.

Adding a new table

Goal: create a UserPreference table that stores key/value preferences per user.

  1. Add a new model to api/database/models/__init__.py:
class UserPreference(Base):
    __tablename__ = "user_preferences"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    user_id = Column(
        UUID(as_uuid=True),
        ForeignKey("users.id", ondelete="CASCADE"),
        nullable=False,
    )
    key = Column(String(100), nullable=False)
    value = Column(Text, nullable=True)
    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))

    user = relationship("User", back_populates="preferences", lazy="joined")
  1. Add the reverse relationship on User:
class User(Base):
    # … existing relationships …
    preferences = relationship(
        "UserPreference", back_populates="user",
        cascade="all, delete-orphan", lazy="selectin",
    )
  1. Generate and review the migration (same steps as above).

  2. Add a new repository in api/services/app_data/repositories/ following the pattern of existing ones (UserRepository, etc.) and register it on AppDataService:

# In AppDataService.__init__:
self.preferences = UserPreferenceRepository()

Adding a new enum value

If you need to extend an existing enum (e.g., add a new IDProvider):

  1. Add the value to api/database/enums.py:
class IDProvider(enum.Enum):
    MOCK_IDP = 'mock-idp'
    GOOGLE   = 'google'    # new
  1. Alembic cannot autogenerate enum alterations — write the migration manually:
from alembic import op

def upgrade() -> None:
    op.execute("ALTER TYPE app.idprovider ADD VALUE IF NOT EXISTS 'google'")

def downgrade() -> None:
    pass  # PostgreSQL does not support removing enum values without recreating the type

Running migrations in development

cd api

# Apply all pending migrations
alembic upgrade head

# Check current revision
alembic current

# Roll back one step
alembic downgrade -1

# Show migration history
alembic history --verbose

Tutorial 5 — Understanding and Extending the Auth Flow

How authentication works end-to-end

The auth system is a two-step token exchange pattern:

sequenceDiagram
    autonumber
    participant C  as Client
    participant IDP as Mock IDP<br/>/mock-idp/*
    participant A  as Auth Router<br/>/auth/login
    participant ADS as AppDataService
    participant DB  as PostgreSQL<br/>(app schema)

    note over C,IDP: Step 1 — Obtain IDP token (development only)

    C  ->>  IDP: POST /mock-idp/authenticate<br/>{ idp_user_id }
    IDP -->> C:  { access_token: JWT(sub=idp_user_id) }

    note over C,DB: Step 2 — Exchange IDP token for App Session

    C  ->>  A: POST /auth/login<br/>{ idp_token, id_provider, user_type,<br/>language, origin, capabilities }
    A  ->>  IDP: GET /.well-known/jwks.json
    IDP -->> A:  { keys: [{ alg: HS256, k: secret_key }] }
    A  ->>  A: jwt.decode(idp_token, secret_key)<br/>→ extract sub claim (login_id)
    A  ->>  ADS: initialize_app_session(context)

    ADS ->> DB: SELECT IDPLogin<br/>WHERE (idp, login_id)

    alt Returning user
        DB  -->> ADS: IDPLogin found
        ADS ->>  DB: SELECT User WHERE id = idp_login.user_id
        DB  -->> ADS: User record
        ADS ->>  DB: UPDATE User SET last_seen_at = now()
    else New user
        DB  -->> ADS: No IDPLogin found
        ADS ->>  DB: INSERT User (user_type, language, age_class)
        DB  -->> ADS: New User record
        ADS ->>  DB: INSERT IDPLogin (idp, login_id, user_id)
    end

    ADS ->>  DB: UPSERT AppSession (user_id, origin, capabilities)
    DB  -->> ADS: AppSession record
    ADS ->>  DB: COMMIT

    opt Returning user
        ADS ->> DB: SELECT Conversations<br/>WHERE user_id ORDER BY updated_at DESC
        DB  -->> ADS: UserHistory
    end

    ADS -->> A: AuthResponse
    A   -->> C: { bearer_token, app_session, user_history }

    note over C: Store bearer_token → Authorization: Bearer token;

Step 1 — Mock IDP (POST /api/v1/mock-idp/authenticate)
The client sends an idp_user_id string. The mock IDP wraps it in a JWT (signed with "encryption_key") and returns { access_token }. In production this step is replaced by a real OAuth 2.0 / OIDC provider.

Step 2 — Token exchange (POST /api/v1/auth/login)
The backend:

  1. Fetches the IDP’s public key from GET /mock-idp/.well-known/jwks.json.
  2. Decodes the JWT and extracts sub (the external user ID).
  3. Calls AppDataService.initialize_app_session() which:
    • Looks up IDPLogin by (id_provider, login_id) — a composite primary key.
    • If found: loads the existing User and increments last_seen_at.
    • If not found: creates a new User and a new IDPLogin row.
    • Upserts an AppSession for the (user_id, origin) pair (one session per client origin per user).
    • Returns AuthResponse containing bearer_token (currently the app_session_id UUID), app_session, and user_history for returning users.

The bearer_token is stored by the client and sent as Authorization: Bearer <token> on subsequent requests.

Key data models

IDPLogin (composite PK: idp + login_id)
  └── user_id ──► User
                    └── app_sessions ──► AppSession (PK: id)
                    └── conversations ──► Conversation

IDPLogin decouples the external identity from the internal user profile. One user can have multiple IDP logins (e.g., both a mock IDP and a future Google IDP).

Testing the auth flow locally

# Step 1 — get a mock IDP token
curl -s -X POST http://localhost:8000/api/v1/mock-idp/authenticate \
  -H "Content-Type: application/json" \
  -d '{"idp_user_id": "test-user-42"}' | python3 -m json.tool

# Step 2 — exchange for an app session
curl -s -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "idp_token": "<token-from-step-1>",
    "id_provider": "mock-idp",
    "user_type": "teenager",
    "language": "FR",
    "origin": "web_react",
    "capabilities": {"supports_sse": true}
  }' | python3 -m json.tool

A successful response returns bearer_token (use as Authorization: Bearer <token>), the app_session object, and — for returning users — user_history with a list of past conversations.

Adding a new Identity Provider

  1. Add the enum value to IDProvider in api/database/enums.py:
class IDProvider(enum.Enum):
    MOCK_IDP = 'mock-idp'
    GOOGLE   = 'google'    # new
  1. Write a migration to extend the PostgreSQL enum (see Tutorial 4 — Adding a new enum value).

  2. Add public-key retrieval logic in api/routers/auth.py. Replace the call to mock_idp_pk_url with a lookup that selects the correct JWKS endpoint based on auth_idp_request.id_provider:

IDP_JWKS_URLS = {
    IDProvider.MOCK_IDP: f"http://localhost:{settings.server.port}/api/v1/mock-idp/.well-known/jwks.json",
    IDProvider.GOOGLE:   "https://www.googleapis.com/oauth2/v3/certs",
}

async def get_public_key(id_provider: IDProvider) -> str:
    url = IDP_JWKS_URLS[id_provider]
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            data = await resp.json()
            return data["keys"][0]["k"]
  1. Update the AuthIDPRequest schema and login_with_idp handler to pass id_provider through to get_public_key.

  2. Write auth tests in tests/auth/test_auth_flow.py that cover the new provider path.


Tutorial 6 — Monkeypatches Reference

The codebase applies four runtime patches to SQLAlchemy and the test engine. They fall into two groups with distinct concerns, documented in the pages where they are most useful:

Patch Lives in Why Full docs
Enum inherit_schema run.py, migrations/env.py PostgreSQL ENUM types are created in the schema of each table that references them, not public Data Layer → Database Migrations
Document.__hash__ services/rag/__init__.py LangChain Document is unhashable by default; the RAG cache needs documents as dict/set keys RAG Pipeline → Known Workaround
create_engine kwargs strip tests/conftest.py Remove PG-only pool args that break SQLite Testing → database.py fixtures
create_async_engine URL redirect tests/conftest.py Force all async engines to the in-memory test DB Testing → database.py fixtures
Base.metadata.schema reset tests/fixtures/database.py Remove app. schema prefix incompatible with SQLite Testing → database.py fixtures

All patches are applied at module import time with no teardown, which is safe because each process has exactly one schema target (production) or one database target (tests).

Known duplication: the Enum inherit_schema patch is copy-pasted identically into run.py and migrations/env.py. A future clean-up would centralise it in api/core/patches.py.


Tutorial 7 — Add Documents to the RAG Knowledge Base

  1. Place the new PDF(s) in the correct sub-directory:

    • rag/raw_files/youth/ for teenager content
    • rag/raw_files/adult/ for parent content
  2. Index the new documents:

    cd rag
    
    # Index teenager documents only
    python index_documents.py --variant teenager
    
    # Index parent documents only
    python index_documents.py --variant parent
    
    # Index both
    python index_documents.py --variant all
  3. Verify retrieval by sending a chat message that should trigger the relevant content and checking the Langfuse trace for the research_educational_strategies tool call.


Tutorial 8 — Add or Toggle a Feature Flag

Toggle an existing flag in development

Flags are seeded into the database on startup. Call GET /api/v1/features to inspect current resolved values.

Add a new feature flag

  1. Seed it — add a new FeatureFlag entry in FeatureFlagService._seed_default_flags():
FeatureFlag(
    name="my_new_flag",
    description="Enable the new widget",
    is_enabled=False,
    environment="all",
    visibility="frontend",   # "frontend", "backend", or "all"
    variant="all",           # "teenager", "parent", or "all"
)
  1. Use it on the backend (if visibility includes "backend"):
from core.dependencies import get_feature_flag_service
svc = get_feature_flag_service()
enabled = await svc.get_flag_value("my_new_flag", default=False)
  1. Use it on the frontend (if visibility includes "frontend"):
import { useBooleanFlagValue } from '@openfeature/react-sdk';
const enabled = useBooleanFlagValue('my_new_flag', false);
  1. Document it — add a row to the feature-flag table in Backend API.

Tutorial 9 — Write a Test

All tests live under api/tests/. Use the modular fixtures from tests/fixtures/ — do not redefine infrastructure in individual test files.

Minimal example — testing a service method:

# tests/chat/test_my_feature.py
import pytest
from tests.fixtures.services import app_data_service   # imported via conftest

@pytest.mark.asyncio
async def test_create_user(app_data_service, db_session):
    user = await app_data_service.create_user(
        user_type="teenager", language="FR"
    )
    assert user.id is not None
    assert user.language == "FR"

See Testing Guide for the full fixture reference and test-category conventions.


Tutorial 10 — Deploy to Staging

Once your change is reviewed and merged to dev, the CI pipeline deploys automatically. For manual hot-stage deployments before merging:

# From the repo root — requires GCP credentials
python deployment/deploy.py --variant teenager --suffix my-feature

# Frontend only
python deployment/deploy.py --frontend-only --variant teenager --suffix my-feature

# Both variants in parallel
python deployment/deploy.py --suffix my-feature

See Deployment for the full option reference.


Tutorial 11 — Contribute to the Documentation

Documentation uses Quarto .qmd files (a superset of Markdown). No local Quarto installation is needed — VS Code and JupyterLab both have Quarto extensions that render previews.

  1. Edit or create a .qmd file in doc/.
  2. If creating a new page, register it in doc/_quarto.yaml:
    • Add the filename to the project.render list.
    • Add a sidebar entry under the appropriate section in website.sidebar.contents.
  3. Open a pull request to dev. The build_quarto_docs.yaml workflow will render the site and deploy it automatically on merge.

See Extending the Documentation for the Quarto syntax reference.