Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is the mechanism that connects an agent's knowledge base to its language model at conversation time, ensuring responses are grounded in specific, up-to-date information rather than relying solely on the model's training data.
What is RAG?
RAG is a three-stage pipeline used across the Newo platform:
- Retrieve -- Given the current conversation, search a document corpus for the most relevant articles.
- Augment -- Inject the retrieved content into the prompt as additional context.
- Generate -- The language model produces a response informed by both the conversation history and the retrieved documents.
This pattern allows agents to answer questions about business-specific topics (menus, pricing, policies, service details) without encoding all of that information directly into the system prompt. Instead, the agent dynamically selects the right context for each conversation turn.
flowchart LR
UM["User message"] --> R1
subgraph Retrieve["① Retrieve"]
R1["Load all rag_context<br/>topics from AKB"] --> R2["Search LLM selects<br/>1–3 article IDs"] --> R3["Fetch full content<br/>by source ID"]
end
subgraph Augment["② Augment"]
A1["Store in rag<br/>persona attribute"] --> A2["Inject into<br/>agent prompt"]
end
subgraph Generate["③ Generate"]
G1["LLM produces<br/>grounded response"]
end
R3 --> A1
A2 --> G1
The Active Knowledge Base as document store
The document corpus for RAG is the Active Knowledge Base (AKB). AKB topics tagged with the label rag_context form the retrieval pool. Each topic has four fields that participate in the RAG pipeline:
| AKB field | Role in RAG |
|---|---|
name | The topic title. Used during retrieval to match against user intent. |
facts | A short summary or keyword set. Presented to the search LLM alongside the name for relevance scoring. |
summary | The full content body. Injected into the agent prompt when the topic is selected. |
source | A unique identifier (e.g., R1, aft_001). Used to deduplicate and look up topics after the search LLM returns its selection. |
AKB topics are scoped to the GeneralManagerAgent persona. The RAG retrieval skills filter by this persona ID and the rag_context label to ensure only knowledge base content -- not task data or other AKB entries -- enters the retrieval pool.
::: 🗒️ NOTE Smaller, well-named topics produce better retrieval results. The search LLM evaluates topic names and facts to find semantic matches, so a topic named "Security protocols for company-owned laptops" will match more precisely than one named "Security protocols." :::
CARagFlow: the RAG pipeline
CARagFlow is the dedicated flow within ConvoAgent that implements the RAG pipeline. It is triggered by the prepare_rag_context_command event and runs four skills in sequence:
title: CARagFlow
idn: CARagFlow
default_runner_type: guidance
default_provider_idn: google
default_model_idn: gemini25_flash
skills:
- idn: PrepareRagContext # Orchestrates the full pipeline
runner_type: nsl
- idn: get_memory # Retrieves conversation history
runner_type: nsl
parameters:
- name: user_id
- name: count
default_value: "15"
- idn: prepare_rag_context_schema # Defines the structured output schema
runner_type: nsl
- idn: structured_generation # Runs constrained LLM generation
runner_type: nsl
parameters:
- name: prompt
- name: schema
events:
- idn: prepare_rag_context_command
skill_selector: skill_idn
skill_idn: PrepareRagContext
interrupt_mode: queueWhen CARagFlow runs
The prepare_rag_context_command event is fired at two key moments:
- Conversation start -- The
ConversationStartedSkillsends the event immediately after initializing the session, so RAG context is ready before the agent's first response. - Each user message -- The user message handler sends the event before generating a reply, ensuring the RAG context reflects the latest user input.
For voice channels (non-chat), the event is dispatched asynchronously to CARagFlow via SendSystemEvent. For chat channels, the same RAG logic runs synchronously within CAMainFlow using the prepare_rag_context skill (an inline copy of the same algorithm) to avoid latency from cross-flow event dispatch.
Step 1: Retrieve candidate articles
The PrepareRagContext skill begins by fetching all AKB topics labeled rag_context:
{% set general_manager_agent_persona = GetAgent(idn="GeneralManagerAgent").persona_id %}
{% set rag_context_topics = SearchFuzzyAkb(
query="",
searchFields=["name"],
numberTopics=300,
scoreThreshold=0,
filterByPersonaIds=[general_manager_agent_persona],
labels=["rag_context"]
) %}This retrieves up to 300 topics with no minimum score threshold -- it loads the entire rag_context pool. The topics are then sorted by updatedAt (newest first) and formatted into a string representation:
{% set rag_context_topics = rag_context_topics | sort(attribute='updatedAt', reverse=True) %}
{% for topic in rag_context_topics %}
{% set rag_context_topics_string = rag_context_topics_string
+ "article_title: " + topic.topic + "\n"
+ "article_facts: [" + ", ".join(topic.facts) + "]\n"
+ "article_last_updated: " + topic.updatedAt|string + "\n"
+ "article_id: " + topic.source + "\n\n---\n\n" %}
{% endfor %}Step 2: LLM-based semantic search
The formatted articles, the conversation history, and conversation metadata are assembled into a prompt for a "search assistant" LLM call:
{% set prompt %}You are the search assistant.
**Task:**
Find the 1-3 articles from <KnowledgeBaseArticles> that are the best
semantic match for the user's topics in the **entire** <ConversationHistory>.
**CRITICAL RULES:**
1. Matches for the **last user message** are the most important and
MUST be listed first.
2. If multiple articles cover the same topic, select only the most
recent one (check `article_last_updated`).
3. Provide up to three relevant article IDs.
<ConversationMeta>
{{GetPersonaAttribute(id=user_id, field="conversation_meta")}}
</ConversationMeta>
<ConversationHistory>
{{get_memory(count=20, user_id=user_id)}}
</ConversationHistory>
<KnowledgeBaseArticles>
{{rag_context_topics_string}}
</KnowledgeBaseArticles>
{% endset %}The LLM is constrained to return structured JSON using the structured_generation skill:
{% set schema = json.loads(prepare_rag_context_schema()) %}
{% set search_result_str = structured_generation(prompt=prompt, schema=schema) %}The schema enforces a strict output format:
{
"search_result": {
"article_id_1": "string (article ID or 'none')",
"article_id_2": "string (article ID or 'none')",
"article_id_3": "string (article ID or 'none')"
}
}Step 3: Fetch full article content
The returned article IDs are used to retrieve the full AKB topics by their source field:
{% set search_results = json.loads(search_result_str).search_result.values() %}
{% set rag_topics = [] %}
{% for index in search_results %}
{% if (index != "none") and (not index in used_indexes) %}
{% set rag_topics = rag_topics + SearchFuzzyAkb(
query=index,
searchFields=["source"],
numberTopics=1,
scoreThreshold=1.0,
filterByPersonaIds=[general_manager_agent_persona],
labels=["rag_context"]
) %}
{% endif %}
{% endfor %}The scoreThreshold=1.0 ensures an exact match on the source field. This two-phase approach (broad retrieval then targeted lookup) means the search LLM only sees lightweight metadata (titles, facts, IDs), while the full article summaries are fetched only for the selected topics.
Step 4: Store RAG context
The selected topics' titles and summaries are concatenated and stored as the rag persona attribute:
{% set rag = "" %}
{% for topic in rag_topics %}
{% set rag = rag + topic.topic + "\n" + topic.summary + "\n\n---\n\n" %}
{% endfor %}
{{SetPersonaAttribute(field="rag", value=rag, id=user_id)}}This persona attribute is later read during prompt assembly and injected into the <AdditionalInformation> section of the agent prompt.
How RAG content enters the prompt
The agent prompt is assembled by the prompt_build_base skill in CAMainFlow. It contains a placeholder for RAG content:
<AdditionalInformation alt="additional information" role="context">
<||rag_placeholder||>
</AdditionalInformation>At generation time, the placeholder is replaced with the value of the rag persona attribute:
{% set compiled_prompt = compiled_prompt.replace(
"<||rag_placeholder||>",
GetPersonaAttribute(id=user_id, field="rag").strip() or "No additional information"
) %}If no RAG topics were matched (or the knowledge base is empty), the placeholder resolves to "No additional information," ensuring the prompt remains well-formed.
Conversation context integration
RAG retrieval depends on conversation context from two sources:
Memory retrieval via get_memory
The get_memory skill collects conversation history by aggregating messages across all active channel actors (Newo Voice, Newo Chat, Telegram, SMS, API, Sandbox, and others). It retrieves the most recent messages (default: 15, increased to 20 for RAG) up to a 10,000-character limit:
{{Return(val=GetMemory(count=count, maxLen=10000, filterByActorIds=actors))}}The skill supports filtering by date range, actor list, and optional inclusion of system messages and agent thoughts. For RAG, it typically retrieves only user-facing messages to give the search LLM a clean view of the conversation.
Conversation meta
The conversation_meta persona attribute accumulates extracted information about the user and conversation context (such as detected intent, user name, and session details). This metadata is injected into the search prompt inside <ConversationMeta> tags, giving the search LLM additional signals beyond the raw message history.
Structured generation
The structured_generation skill enforces that the search LLM returns valid JSON conforming to a provided schema. It uses constrained generation parameters:
{{system}}{{prompt.strip()}}{{end}}
{{assistant}}
{% set result_json = Gen(
jsonSchema=schema,
validateSchema="True",
temperature=0.2,
topP=0,
maxTokens=4000,
skipFilter=True,
thinkingBudget=85
) %}
{{end}}
{{Return(val=result_json)}}Key parameters:
| Parameter | Value | Purpose |
|---|---|---|
jsonSchema | The output schema | Forces output to match the defined JSON structure |
validateSchema | "True" | Validates the output against the schema before returning |
temperature | 0.2 | Low temperature for deterministic article selection |
topP | 0 | Greedy decoding to maximize consistency |
thinkingBudget | 85 | Allocates a small reasoning budget for the selection task |
This pattern is reusable beyond RAG -- any skill that needs structured LLM output can call structured_generation with a custom prompt and schema.
CADataInjectionFlow: static and dynamic data injection
While CARagFlow handles retrieval from the AKB, CADataInjectionFlow handles injection of external and time-sensitive data into the agent prompt. These two flows complement each other: RAG provides knowledge base content, while data injection provides live operational data.
FetchData: timer-based external retrieval
The FetchData skill runs on a configurable timer to pull data from external sources:
{{Set(name="payload", value=GetCustomerAttribute(
field="project_attributes_private_data_injection_payload"
))}}
{{SendCommand(
commandIdn="data_injection",
integrationIdn="api",
connectorIdn="webhook",
payload=payload
)}}It reads the webhook payload configuration from a customer attribute, sends a command to the API webhook integration, and resets the timer for the next fetch cycle. The update interval is controlled by the project_attributes_setting_data_injection_update_period customer attribute.
Data processing pipeline
When the webhook responds with new data, the RetrieveDataSkill processes it:
- Extracts the data payload from the triggered event.
- Optionally summarizes the data if
project_attributes_setting_data_injection_needs_summaryis"True"(using the_summarizeInjectedDataskill to condense content within a token budget). - Stores the processed result in the
project_attributes_private_data_injection_retrieved_datacustomer attribute.
Custom prompt sections
The InjectCustomSectionSkill and RemoveCustomSectionSkill manage named sections that are injected into the agent prompt via the custom_prompt_section persona attribute. Each section is wrapped in XML-style tags:
{{#system~}}
<{{sectionName}}>
{{sectionValue}}
</{{sectionName}}>
---
{{~/system}}These sections appear in the prompt inside the <ConversationMeta> block. The RemoveCustomSectionSkill uses regex to strip a named section when it is no longer needed:
{% set cleaned_custom_prompt_section = re.sub(
'<' ~ section_name ~ '>.*?</' ~ section_name ~ '>\\s*---\\s*',
'',
current_custom_prompt_section,
flags=re.DOTALL
) %}Calendar injection
The calendar skill generates a 45-day calendar (from today forward) with human-readable date labels ("today Monday 24th of February 2026") and stores it in the project_attributes_private_dynamic_prompt_calendar customer attribute. This calendar is injected into the <ConvoAgentCalendar> prompt section, giving the agent awareness of dates for scheduling conversations.
Populating the knowledge base
AKB topics labeled rag_context can be populated through three methods:
Manual topic creation
Create individual AKB topics through the Builder interface. Each topic requires a name, facts, summary, source (unique ID), and the rag_context label. See Manually add context to your AI Employee for the step-by-step process.
Bulk text append
Set the project_attributes_rag_knowledge_base_append_text customer attribute with unstructured text. The GeneralManagerAgent processes this text through several stages:
- Converts unstructured text into structured Markdown with topic separators.
- Parses each topic block to extract the title (H1), facts (first H2), labels (second H2), and body content.
- Creates AKB topics with the
rag_autogenerated_from_textlabel and auto-incremented source IDs (e.g.,aft_001,aft_002).
::: ❗❗ IMPORTANT The input limit for text append is approximately 100,000 characters. Exceeding this limit may cause processing errors. The system clears the input field after processing to signal readiness for new input. :::
Website scraping append
Set the project_attributes_rag_knowledge_base_append_websites customer attribute with a URL and extraction instructions. The system visits the URL, scrapes the content, and converts it into AKB topics labeled rag_autogenerated_from_websites.
RAG-related attributes
The following attributes participate in the RAG pipeline:
Persona attributes
Attribute idn | Type | Purpose |
|---|---|---|
rag | string | The assembled RAG context (topic titles + summaries) for the current session. Updated before each agent response. Cleared at session end. |
conversation_meta | string | Accumulated conversation metadata. Fed into the RAG search prompt as additional context for article matching. |
custom_prompt_section | string | Injected prompt sections from CADataInjectionFlow. Appears in the <ConversationMeta> block alongside RAG content. |
Customer attributes
Attribute idn | Type | Purpose |
|---|---|---|
project_attributes_setting_data_injection_is_enabled | bool | Enables or disables the external data injection timer. |
project_attributes_setting_data_injection_update_period | string | Timer interval for FetchData executions. |
project_attributes_setting_data_injection_needs_summary | bool | Whether fetched data should be summarized before injection. |
project_attributes_private_data_injection_payload | string | The webhook payload template for external data fetching. |
project_attributes_private_data_injection_retrieved_data | string | The most recently fetched and processed external data. |
project_attributes_rag_knowledge_base_append_text | string | Input field for bulk text-to-AKB conversion. |
project_attributes_rag_knowledge_base_append_text_reset | bool | When "True", deletes all rag_autogenerated_from_text topics. |
project_attributes_rag_knowledge_base_append_websites | string | Input field for website URL scraping instructions. |
project_attributes_rag_knowledge_base_append_websites_reset | bool | When "True", deletes all rag_autogenerated_from_websites topics. |
End-to-end RAG flow
The following sequence shows how RAG operates during a typical conversation turn:
- A user sends a message on any channel.
- The user message handler fires the
prepare_rag_context_commandevent. CARagFlowreceives the event and runsPrepareRagContext.PrepareRagContextcallsSearchFuzzyAkbto load allrag_contexttopics (up to 300).PrepareRagContextcallsget_memoryto retrieve the last 20 conversation messages.PrepareRagContextreads theconversation_metapersona attribute.- A search prompt is assembled with the conversation history, metadata, and article list.
structured_generationruns the search LLM with the JSON schema, returning 1-3 article IDs.- The selected articles are fetched by
sourcefield via exact-matchSearchFuzzyAkbcalls. - The topic titles and summaries are concatenated and stored in the
ragpersona attribute. - When the main prompt is compiled, the
<||rag_placeholder||>is replaced with theragattribute value. - The LLM generates a response grounded in the selected knowledge base content.
sequenceDiagram
participant User
participant ConvoAgent
participant CARagFlow
participant AKB
participant SearchLLM
participant MainLLM
User->>ConvoAgent: Message on any channel
ConvoAgent->>CARagFlow: prepare_rag_context_command event
CARagFlow->>AKB: SearchFuzzyAkb (all rag_context topics, up to 300)
AKB-->>CARagFlow: Topic metadata (names, facts, source IDs)
CARagFlow->>CARagFlow: Fetch last 20 messages + conversation_meta
CARagFlow->>SearchLLM: Prompt with topic list + conversation context
SearchLLM-->>CARagFlow: 1–3 matching article IDs
CARagFlow->>AKB: Exact-match fetch by source ID
AKB-->>CARagFlow: Full topic summaries
CARagFlow->>CARagFlow: Store in rag persona attribute
ConvoAgent->>MainLLM: Prompt with rag_placeholder replaced
MainLLM-->>ConvoAgent: Grounded response
ConvoAgent->>User: Response
::: ⚠️ CAUTION The RAG context is refreshed on every user message. If the knowledge base is updated mid-conversation, the agent will pick up the new content on the next turn. However, there may be a brief window where the agent's RAG context reflects the previous state. :::
How RAG relates to other concepts
- Active Knowledge Base -- The AKB is the document store that RAG retrieves from. Topics must have the
rag_contextlabel to be included in the retrieval pool. See Manually add context to your AI Employee for details on creating AKB topics. - Flows --
CARagFlowandCADataInjectionFloware the two flows that handle knowledge retrieval and data injection respectively. See Flows for details on flow structure and event routing. - Events -- The
prepare_rag_context_commandevent triggers RAG retrieval. Theprepare_injecting_dataevent triggers data injection. See Events and the event system for the full event lifecycle. - Attributes system -- RAG reads and writes persona attributes (
rag,conversation_meta) and customer attributes (knowledge base settings, data injection configuration). See Attributes system for the full attribute taxonomy. - Skills -- Each step in the RAG pipeline is implemented as a skill with its own prompt script. See Skills for details on NSL and guidance runner types.
Updated about 6 hours ago
