Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is the mechanism that connects an agent's knowledge base to its language model at conversation time, ensuring responses are grounded in specific, up-to-date information rather than relying solely on the model's training data.

What is RAG?

RAG is a three-stage pipeline used across the Newo platform:

Retrieve -- Given the current conversation, search a document corpus for the most relevant articles.
Augment -- Inject the retrieved content into the prompt as additional context.
Generate -- The language model produces a response informed by both the conversation history and the retrieved documents.

This pattern allows agents to answer questions about business-specific topics (menus, pricing, policies, service details) without encoding all of that information directly into the system prompt. Instead, the agent dynamically selects the right context for each conversation turn.

flowchart LR
    UM["User message"] --> R1

    subgraph Retrieve["① Retrieve"]
        R1["Load all rag_context<br/>topics from AKB"] --> R2["Search LLM selects<br/>1–3 article IDs"] --> R3["Fetch full content<br/>by source ID"]
    end

    subgraph Augment["② Augment"]
        A1["Store in rag<br/>persona attribute"] --> A2["Inject into<br/>agent prompt"]
    end

    subgraph Generate["③ Generate"]
        G1["LLM produces<br/>grounded response"]
    end

    R3 --> A1
    A2 --> G1

The Active Knowledge Base as document store

The document corpus for RAG is the Active Knowledge Base (AKB). AKB topics tagged with the label rag_context form the retrieval pool. Each topic has four fields that participate in the RAG pipeline:

AKB field	Role in RAG
`name`	The topic title. Used during retrieval to match against user intent.
`facts`	A short summary or keyword set. Presented to the search LLM alongside the name for relevance scoring.
`summary`	The full content body. Injected into the agent prompt when the topic is selected.
`source`	A unique identifier (e.g., `R1`, `aft_001`). Used to deduplicate and look up topics after the search LLM returns its selection.

AKB topics are scoped to the GeneralManagerAgent persona. The RAG retrieval skills filter by this persona ID and the rag_context label to ensure only knowledge base content -- not task data or other AKB entries -- enters the retrieval pool.

:::
🗒️ NOTE
Smaller, well-named topics produce better retrieval results. The search LLM evaluates topic names and facts to find semantic matches, so a topic named "Security protocols for company-owned laptops" will match more precisely than one named "Security protocols."
:::

CARagFlow: the RAG pipeline

CARagFlow is the dedicated flow within ConvoAgent that implements the RAG pipeline. It is triggered by the prepare_rag_context_command event and runs four skills in sequence:

title: CARagFlow
idn: CARagFlow
default_runner_type: guidance
default_provider_idn: google
default_model_idn: gemini25_flash

skills:
  - idn: PrepareRagContext          # Orchestrates the full pipeline
    runner_type: nsl
  - idn: get_memory                 # Retrieves conversation history
    runner_type: nsl
    parameters:
      - name: user_id
      - name: count
        default_value: "15"
  - idn: prepare_rag_context_schema # Defines the structured output schema
    runner_type: nsl
  - idn: structured_generation      # Runs constrained LLM generation
    runner_type: nsl
    parameters:
      - name: prompt
      - name: schema

events:
  - idn: prepare_rag_context_command
    skill_selector: skill_idn
    skill_idn: PrepareRagContext
    interrupt_mode: queue

When CARagFlow runs

The prepare_rag_context_command event is fired at two key moments:

Conversation start -- The ConversationStartedSkill sends the event immediately after initializing the session, so RAG context is ready before the agent's first response.
Each user message -- The user message handler sends the event before generating a reply, ensuring the RAG context reflects the latest user input.

For voice channels (non-chat), the event is dispatched asynchronously to CARagFlow via SendSystemEvent. For chat channels, the same RAG logic runs synchronously within CAMainFlow using the prepare_rag_context skill (an inline copy of the same algorithm) to avoid latency from cross-flow event dispatch.

Step 1: Retrieve candidate articles

The PrepareRagContext skill begins by fetching all AKB topics labeled rag_context:

{% set general_manager_agent_persona = GetAgent(idn="GeneralManagerAgent").persona_id %}

{% set rag_context_topics = SearchFuzzyAkb(
    query="",
    searchFields=["name"],
    numberTopics=300,
    scoreThreshold=0,
    filterByPersonaIds=[general_manager_agent_persona],
    labels=["rag_context"]
) %}

This retrieves up to 300 topics with no minimum score threshold -- it loads the entire rag_context pool. The topics are then sorted by updatedAt (newest first) and formatted into a string representation:

{% set rag_context_topics = rag_context_topics | sort(attribute='updatedAt', reverse=True) %}

{% for topic in rag_context_topics %}
    {% set rag_context_topics_string = rag_context_topics_string
        + "article_title: " + topic.topic + "\n"
        + "article_facts: [" + ", ".join(topic.facts) + "]\n"
        + "article_last_updated: " + topic.updatedAt|string + "\n"
        + "article_id: " + topic.source + "\n\n---\n\n" %}
{% endfor %}

Step 2: LLM-based semantic search

The formatted articles, the conversation history, and conversation metadata are assembled into a prompt for a "search assistant" LLM call:

{% set prompt %}You are the search assistant.

**Task:**
Find the 1-3 articles from <KnowledgeBaseArticles> that are the best
semantic match for the user's topics in the **entire** <ConversationHistory>.

**CRITICAL RULES:**
1. Matches for the **last user message** are the most important and
   MUST be listed first.
2. If multiple articles cover the same topic, select only the most
   recent one (check `article_last_updated`).
3. Provide up to three relevant article IDs.

<ConversationMeta>
{{GetPersonaAttribute(id=user_id, field="conversation_meta")}}
</ConversationMeta>

<ConversationHistory>
{{get_memory(count=20, user_id=user_id)}}
</ConversationHistory>

<KnowledgeBaseArticles>
{{rag_context_topics_string}}
</KnowledgeBaseArticles>
{% endset %}

The LLM is constrained to return structured JSON using the structured_generation skill:

{% set schema = json.loads(prepare_rag_context_schema()) %}
{% set search_result_str = structured_generation(prompt=prompt, schema=schema) %}

The schema enforces a strict output format:

{
  "search_result": {
    "article_id_1": "string (article ID or 'none')",
    "article_id_2": "string (article ID or 'none')",
    "article_id_3": "string (article ID or 'none')"
  }
}

Step 3: Fetch full article content

The returned article IDs are used to retrieve the full AKB topics by their source field:

{% set search_results = json.loads(search_result_str).search_result.values() %}
{% set rag_topics = [] %}
{% for index in search_results %}
    {% if (index != "none") and (not index in used_indexes) %}
        {% set rag_topics = rag_topics + SearchFuzzyAkb(
            query=index,
            searchFields=["source"],
            numberTopics=1,
            scoreThreshold=1.0,
            filterByPersonaIds=[general_manager_agent_persona],
            labels=["rag_context"]
        ) %}
    {% endif %}
{% endfor %}

The scoreThreshold=1.0 ensures an exact match on the source field. This two-phase approach (broad retrieval then targeted lookup) means the search LLM only sees lightweight metadata (titles, facts, IDs), while the full article summaries are fetched only for the selected topics.

Step 4: Store RAG context

The selected topics' titles and summaries are concatenated and stored as the rag persona attribute:

{% set rag = "" %}
{% for topic in rag_topics %}
    {% set rag = rag + topic.topic + "\n" + topic.summary + "\n\n---\n\n" %}
{% endfor %}

{{SetPersonaAttribute(field="rag", value=rag, id=user_id)}}

This persona attribute is later read during prompt assembly and injected into the <AdditionalInformation> section of the agent prompt.

How RAG content enters the prompt

The agent prompt is assembled by the prompt_build_base skill in CAMainFlow. It contains a placeholder for RAG content:

<AdditionalInformation alt="additional information" role="context">
<||rag_placeholder||>
</AdditionalInformation>

At generation time, the placeholder is replaced with the value of the rag persona attribute:

{% set compiled_prompt = compiled_prompt.replace(
    "<||rag_placeholder||>",
    GetPersonaAttribute(id=user_id, field="rag").strip() or "No additional information"
) %}

If no RAG topics were matched (or the knowledge base is empty), the placeholder resolves to "No additional information," ensuring the prompt remains well-formed.

Conversation context integration

RAG retrieval depends on conversation context from two sources:

Memory retrieval via get_memory

The get_memory skill collects conversation history by aggregating messages across all active channel actors (Newo Voice, Newo Chat, Telegram, SMS, API, Sandbox, and others). It retrieves the most recent messages (default: 15, increased to 20 for RAG) up to a 10,000-character limit:

{{Return(val=GetMemory(count=count, maxLen=10000, filterByActorIds=actors))}}

The skill supports filtering by date range, actor list, and optional inclusion of system messages and agent thoughts. For RAG, it typically retrieves only user-facing messages to give the search LLM a clean view of the conversation.

Conversation meta

The conversation_meta persona attribute accumulates extracted information about the user and conversation context (such as detected intent, user name, and session details). This metadata is injected into the search prompt inside <ConversationMeta> tags, giving the search LLM additional signals beyond the raw message history.

Structured generation

The structured_generation skill enforces that the search LLM returns valid JSON conforming to a provided schema. It uses constrained generation parameters:

{{system}}{{prompt.strip()}}{{end}}
{{assistant}}
{% set result_json = Gen(
    jsonSchema=schema,
    validateSchema="True",
    temperature=0.2,
    topP=0,
    maxTokens=4000,
    skipFilter=True,
    thinkingBudget=85
) %}
{{end}}

{{Return(val=result_json)}}

Key parameters:

Parameter	Value	Purpose
`jsonSchema`	The output schema	Forces output to match the defined JSON structure
`validateSchema`	`"True"`	Validates the output against the schema before returning
`temperature`	`0.2`	Low temperature for deterministic article selection
`topP`	`0`	Greedy decoding to maximize consistency
`thinkingBudget`	`85`	Allocates a small reasoning budget for the selection task

This pattern is reusable beyond RAG -- any skill that needs structured LLM output can call structured_generation with a custom prompt and schema.

CADataInjectionFlow: static and dynamic data injection

While CARagFlow handles retrieval from the AKB, CADataInjectionFlow handles injection of external and time-sensitive data into the agent prompt. These two flows complement each other: RAG provides knowledge base content, while data injection provides live operational data.

FetchData: timer-based external retrieval

The FetchData skill runs on a configurable timer to pull data from external sources:

{{Set(name="payload", value=GetCustomerAttribute(
    field="project_attributes_private_data_injection_payload"
))}}

{{SendCommand(
    commandIdn="data_injection",
    integrationIdn="api",
    connectorIdn="webhook",
    payload=payload
)}}

It reads the webhook payload configuration from a customer attribute, sends a command to the API webhook integration, and resets the timer for the next fetch cycle. The update interval is controlled by the project_attributes_setting_data_injection_update_period customer attribute.

Data processing pipeline

When the webhook responds with new data, the RetrieveDataSkill processes it:

Extracts the data payload from the triggered event.
Optionally summarizes the data if project_attributes_setting_data_injection_needs_summary is "True" (using the _summarizeInjectedData skill to condense content within a token budget).
Stores the processed result in the project_attributes_private_data_injection_retrieved_data customer attribute.

Custom prompt sections

The InjectCustomSectionSkill and RemoveCustomSectionSkill manage named sections that are injected into the agent prompt via the custom_prompt_section persona attribute. Each section is wrapped in XML-style tags:

{{#system~}}
<{{sectionName}}>
{{sectionValue}}
</{{sectionName}}>

---
{{~/system}}

These sections appear in the prompt inside the <ConversationMeta> block. The RemoveCustomSectionSkill uses regex to strip a named section when it is no longer needed:

{% set cleaned_custom_prompt_section = re.sub(
    '<' ~ section_name ~ '>.*?</' ~ section_name ~ '>\\s*---\\s*',
    '',
    current_custom_prompt_section,
    flags=re.DOTALL
) %}

Calendar injection

The calendar skill generates a 45-day calendar (from today forward) with human-readable date labels ("today Monday 24th of February 2026") and stores it in the project_attributes_private_dynamic_prompt_calendar customer attribute. This calendar is injected into the <ConvoAgentCalendar> prompt section, giving the agent awareness of dates for scheduling conversations.

Populating the knowledge base

AKB topics labeled rag_context can be populated through three methods:

Manual topic creation

Create individual AKB topics through the Builder interface. Each topic requires a name, facts, summary, source (unique ID), and the rag_context label. See Manually add context to your AI Employee for the step-by-step process.

Bulk text append

Set the project_attributes_rag_knowledge_base_append_text customer attribute with unstructured text. The GeneralManagerAgent processes this text through several stages:

Converts unstructured text into structured Markdown with topic separators.
Parses each topic block to extract the title (H1), facts (first H2), labels (second H2), and body content.
Creates AKB topics with the rag_autogenerated_from_text label and auto-incremented source IDs (e.g., aft_001, aft_002).

:::
❗❗ IMPORTANT
The input limit for text append is approximately 100,000 characters. Exceeding this limit may cause processing errors. The system clears the input field after processing to signal readiness for new input.
:::

Website scraping append

Set the project_attributes_rag_knowledge_base_append_websites customer attribute with a URL and extraction instructions. The system visits the URL, scrapes the content, and converts it into AKB topics labeled rag_autogenerated_from_websites.

RAG-related attributes

The following attributes participate in the RAG pipeline:

Persona attributes

Attribute `idn`	Type	Purpose
`rag`	string	The assembled RAG context (topic titles + summaries) for the current session. Updated before each agent response. Cleared at session end.
`conversation_meta`	string	Accumulated conversation metadata. Fed into the RAG search prompt as additional context for article matching.
`custom_prompt_section`	string	Injected prompt sections from `CADataInjectionFlow`. Appears in the `<ConversationMeta>` block alongside RAG content.

Customer attributes

Attribute `idn`	Type	Purpose
`project_attributes_setting_data_injection_is_enabled`	bool	Enables or disables the external data injection timer.
`project_attributes_setting_data_injection_update_period`	string	Timer interval for `FetchData` executions.
`project_attributes_setting_data_injection_needs_summary`	bool	Whether fetched data should be summarized before injection.
`project_attributes_private_data_injection_payload`	string	The webhook payload template for external data fetching.
`project_attributes_private_data_injection_retrieved_data`	string	The most recently fetched and processed external data.
`project_attributes_rag_knowledge_base_append_text`	string	Input field for bulk text-to-AKB conversion.
`project_attributes_rag_knowledge_base_append_text_reset`	bool	When `"True"`, deletes all `rag_autogenerated_from_text` topics.
`project_attributes_rag_knowledge_base_append_websites`	string	Input field for website URL scraping instructions.
`project_attributes_rag_knowledge_base_append_websites_reset`	bool	When `"True"`, deletes all `rag_autogenerated_from_websites` topics.

End-to-end RAG flow

The following sequence shows how RAG operates during a typical conversation turn:

A user sends a message on any channel.
The user message handler fires the prepare_rag_context_command event.
CARagFlow receives the event and runs PrepareRagContext.
PrepareRagContext calls SearchFuzzyAkb to load all rag_context topics (up to 300).
PrepareRagContext calls get_memory to retrieve the last 20 conversation messages.
PrepareRagContext reads the conversation_meta persona attribute.
A search prompt is assembled with the conversation history, metadata, and article list.
structured_generation runs the search LLM with the JSON schema, returning 1-3 article IDs.
The selected articles are fetched by source field via exact-match SearchFuzzyAkb calls.
The topic titles and summaries are concatenated and stored in the rag persona attribute.
When the main prompt is compiled, the <||rag_placeholder||> is replaced with the rag attribute value.
The LLM generates a response grounded in the selected knowledge base content.

sequenceDiagram
    participant User
    participant ConvoAgent
    participant CARagFlow
    participant AKB
    participant SearchLLM
    participant MainLLM

    User->>ConvoAgent: Message on any channel
    ConvoAgent->>CARagFlow: prepare_rag_context_command event
    CARagFlow->>AKB: SearchFuzzyAkb (all rag_context topics, up to 300)
    AKB-->>CARagFlow: Topic metadata (names, facts, source IDs)
    CARagFlow->>CARagFlow: Fetch last 20 messages + conversation_meta
    CARagFlow->>SearchLLM: Prompt with topic list + conversation context
    SearchLLM-->>CARagFlow: 1–3 matching article IDs
    CARagFlow->>AKB: Exact-match fetch by source ID
    AKB-->>CARagFlow: Full topic summaries
    CARagFlow->>CARagFlow: Store in rag persona attribute
    ConvoAgent->>MainLLM: Prompt with rag_placeholder replaced
    MainLLM-->>ConvoAgent: Grounded response
    ConvoAgent->>User: Response

:::
⚠️ CAUTION
The RAG context is refreshed on every user message. If the knowledge base is updated mid-conversation, the agent will pick up the new content on the next turn. However, there may be a brief window where the agent's RAG context reflects the previous state.
:::

How RAG relates to other concepts

Active Knowledge Base -- The AKB is the document store that RAG retrieves from. Topics must have the rag_context label to be included in the retrieval pool. See Manually add context to your AI Employee for details on creating AKB topics.
Flows -- CARagFlow and CADataInjectionFlow are the two flows that handle knowledge retrieval and data injection respectively. See Flows for details on flow structure and event routing.
Events -- The prepare_rag_context_command event triggers RAG retrieval. The prepare_injecting_data event triggers data injection. See Events and the event system for the full event lifecycle.
Attributes system -- RAG reads and writes persona attributes (rag, conversation_meta) and customer attributes (knowledge base settings, data injection configuration). See Attributes system for the full attribute taxonomy.
Skills -- Each step in the RAG pipeline is implemented as a skill with its own prompt script. See Skills for details on NSL and guidance runner types.