The Observer system

The Observer is a parallel reasoning system that runs alongside the ConvoAgent after each conversational turn. It acts as a "coach" that analyzes the conversation, extracts user information, evaluates quality, and generates structured directives (thoughts) to prevent hallucinations and missed steps. This article is the definitive developer reference for the Observer's architecture, skills, thought generation pipeline, field tracking, and quality scoring.

How the Observer works

After every conversational turn, the platform fires a broadcast_analyze_conversation event. Two flows subscribe to this event in parallel:

  1. CAObserverFlow — Extracts user information (phone, email, language, opt-in preferences) and evaluates conversation quality.
  2. CAThoughtsFlow — Generates structured "thoughts" (KEY DIRECTIVES) that guide the ConvoAgent's next response.

Together, these flows ensure the ConvoAgent always knows where it is in the conversation, what information has been collected, and what to do next.

User message
    │
    ▼
ConvoAgent generates response
    │
    ▼
broadcast_analyze_conversation event
    │
    ├──► CAObserverFlow (queue mode)
    │       ├── Extract user information
    │       ├── Evaluate conversation quality
    │       └── Determine working hours status
    │
    └──► CAThoughtsFlow (interrupt mode)
            ├── Assemble supervisor context
            ├── Generate KEY DIRECTIVES
            └── Inject thoughts into V2V (if voice)

::: 🗒️ NOTE CAObserverFlow subscribes to broadcast_analyze_conversation in queue mode (waits for current skill to finish), while CAThoughtsFlow subscribes in interrupt mode (interrupts previous thought generation with updated context). This ensures thoughts always reflect the latest conversation state. :::

Observer flow architecture

The CAObserverFlow contains 10 skills that handle conversation analysis and user information extraction.

User information extraction

The _extractUserInformationDuringConversationSkill uses structured JSON generation to extract the following fields from each conversation turn:

FieldDescription
user_nameFull name as spoken or typed by the user.
user_emailEmail address.
user_preferred_languageLanguage detected from the user's last message. Defaults to the project's primary language.
user_phone_number_with_country_codeFull phone number including country code.
user_phone_number_country_codeCountry code extracted via greedy matching against 200+ international codes.
user_phone_number_without_country_codeLocal phone number without the country code prefix.
sms_opt_inDefaults to true. Set to false only if the user explicitly declines SMS.
email_opt_inDefaults to true. Set to false only if the user explicitly declines email.

Thought generation

The CAThoughtsFlow is the core reasoning engine. It assembles a comprehensive supervisor context and generates structured directives that tell the ConvoAgent exactly what to do next.

Supervisor context assembly

The _getCompletePromptSkill assembles the full supervisor prompt from multiple data sources:

Each section is injected by a dedicated skill:

SectionData source
BusinessContextStatic business context + data injection
ExplicitConstraintsChannel-specific rules (phone vs chat), anti-jailbreak, language consistency
WorkingScheduleWorking days, hours, exceptions from customer attributes
IntentTypeMapCompiled intent type map (working hours vs non-working hours variant)
ConvoAgentScenariosAndProceduresCompiled scenarios and procedures
AvailabilityForTheUserRequestedDateTimeBooking slots availability (if booking enabled)
ActionsStatesBooking and calendar action states
AdditionalInformationRAG retrieval context

Thought generation skill

The _genThoughtsSkill is the core of the Observer system. It takes the assembled supervisor context, conversation history, and previous thoughts, then generates a structured CurrentTaskAnalysis block.

The output follows a strict format with these sections:

FIELDS SUCCESSFULLY COLLECTED

Lists fields the agent was expected to collect that have been gathered and reconfirmed (if necessary). Always includes the user language.

**User Language:** English
**First Name:** David (gathered, reconfirmation is not required)
**Last Name:** Yang (gathered, reconfirmation is not required)
**Phone Number:** 12312312323 (reconfirmed)

FIELDS TO BE COLLECTED

Lists fields the agent was expected to collect but has not yet collected — possibly skipped or missed.

**Email:** not known

FIELDS TO BE RECONFIRMED

Lists fields where a value is known (including detected values from <UserInformation>) but requires explicit reconfirmation per the procedure.

**Phone number:** 12312312323 (to be confirmed according to **Step 1.1:
    Reconfirming or Gathering Phone Number** procedure).

KEY DIRECTIVES

The most critical section. Contains the next three sequential steps the ConvoAgent must take, including scenario names, step numbers, procedure references, and code-phrases.

## Scenario 1: "Make a Regular Table Booking"

- 1.5: According to **Reconfirming or Gathering Preferred Date and Time**
    procedure you must say the special **code-phrase**: "Let me check
    availability for [requested day] at [requested time]"

- 1.6: According to **Reconfirming or Gathering Email Address** procedure,
    since it's a phone channel and conversation does not have a valid email
    string, you must tell following **code-phrase**: ...

- 1.7: ...

Mandatory language translation

The thought generation includes a critical global override rule: all code-phrases, quoted phrases, and template phrases must be fully translated into the user's detected language. If the user speaks Spanish, a code-phrase like "Let me check availability" must appear in the KEY DIRECTIVES as "Déjame verificar la disponibilidad." Emitting phrases in the wrong language is classified as a "catastrophic failure" in the system prompt.

Thought generation instruction rules

The _genThoughtsSkill enforces 11 rules for generating KEY DIRECTIVES. The most important beyond the basic step-tracking rules:

RuleDescription
Three-step windowAlways generate directives for the next three sequential uncompleted steps. If fewer than 3 remain, list all remaining.
Skip completed stepsIf a step's field has already been collected, skip it and advance to the next step instead of including it in directives.
Procedure expansionWhen a step references a procedure, study the procedure content and highlight the specific sub-step within it that the agent should perform next.
Branching scenariosWhen a step describes branching, include a brief summary of the two most important branches (prioritize [L] lead scenarios from the IntentTypeMap). Include the first step and code-phrase of each referenced scenario.
MarkersLines starting with #MARKER: in the conversation override normal scenarios. If a marker indicates an action was already done (e.g., "call was transferred"), skip that action. For <CallEndedCase> markers showing a new call, restart at Scenario 0.
Uninterruptible phasesSteps marked as Uninterruptible Scenario Phase mean the conversation topic cannot change. Include an example response for when the user deviates.
Notice blocksScenarios may include > [!TIP], > [!NOTE], > [!IMPORTANT] blocks. Include important information from these blocks in directives.
Background actionsSteps with background actions (e.g., "Tell: Give me a moment to check availability...") should stop directive generation at that step until the action result is visible.
SMS/Email section conditionsSteps with conditional branching based on <SMSInformationAllowedToBeSent> must resolve the condition and include only the matching branch.
Channel conditionsSteps with conversation.channel conditions (phone vs chat) must resolve to the current channel and include only the applicable branch.

Prompt caching

The complete supervisor prompt is compiled once at the start of each session and cached for subsequent turns, avoiding redundant re-assembly within the same conversation. A fresh prompt is generated when a new session begins or if the cache is empty.

Field tracking states

Every field the agent needs to collect progresses through a state machine:

StateDefinition
not knownThe field has an unknown, empty, or not-provided value.
detectedThe value was taken from the <UserInformation> block, which collects information implicitly (e.g., caller ID phone number).
gatheredThe value was explicitly collected from the <Conversation> block according to the corresponding scenario step.
reconfirmedThe value was additionally and explicitly reconfirmed in the conversation according to the corresponding procedure.

A field is considered successfully collected when:

  • It has been gathered and does not require reconfirmation, OR
  • It has been gathered and reconfirmed (when the procedure title contains the Reconfirm keyword)

A field appears in FIELDS TO BE RECONFIRMED when:

  • A value is known (detected or gathered) but the corresponding step's procedure name contains "Reconfirm" and the user has not yet explicitly confirmed the value

Conversation quality scoring

The _evaluateConversationQualitySkill rates ongoing conversation quality on a 0–10 scale and triggers recovery actions when quality is poor.

Quality indicators

The skill evaluates quality based on signs of a poor connection:

  • User says "Hello?" multiple times
  • User sends incoherent short messages
  • User sends short grammatically incorrect sentences
  • User sends short sentences without punctuation
  • User sends messages that cannot be understood properly

Scoring guidelines

Score rangeInterpretation
7–10Good quality. No action needed.
4–6Moderate quality. No action triggered.
0–3Poor quality. Recovery action triggered.

::: ❗❗ IMPORTANT The quality evaluation applies leniency to early-stage conversations. A short or minimal conversation does not automatically receive a low rating unless there is clear evidence of poor quality. Improvements in later turns are weighted more heavily than initial issues. :::

Recovery actions

When a conversation scores 3 or below, the system triggers one of three recovery actions based on configuration:

ConditionAction
Phone channel + call transfer enabled + business openTransfer the call to a human co-worker with message: "Sorry, the line is breaking up... I will transfer the call to my co-worker."
Report enabled (no transfer available)Offer to relay information to the manager: "Sorry, the line is breaking up... I can relay the information to our manager."
Fallback (no transfer, no report)Suggest calling back: "Sorry, the line is breaking up... Do you mind calling back?"

Once a low score is recorded, the flag agent_gave_low_conversation_quality_score is set to True on the persona, preventing repeated quality evaluations for the same session.

Observer and V2V integration

When the conversation is on a voice-to-voice channel, the generated thoughts are injected directly into the V2V model as context. This is the "backward prompting" mechanism.

After generating thoughts, _genThoughtsSkill checks if the session uses V2V and injects the thoughts. The thoughts are wrapped in a <CurrentTaskAnalysis> block with a warning that the instructions may be slightly outdated (by one or more turns), and should be used to align the ConvoAgent's current scope rather than as absolute commands.

Waiting mode

The CAThoughtsFlow supports a waiting mode that pauses thought generation while the system waits for an external action to complete (e.g., checking booking availability).

EventSkillEffect
convoagent_activate_waiting_modeWaitingModeActivateSkillSets waiting_mode state to "activated". Thought generation is skipped.
urgent_messageWaitingModeDisableSkillSets waiting_mode state to "disabled". Thought generation resumes.
waiting_mode_fallbackWaitingModeDisableSkillTimer-based fallback that disables waiting mode if the external action takes too long.

Example workflow

Here is an end-to-end example of how the Observer processes a restaurant reservation conversation:

  1. User says: "I would like to reserve a table for 4 people this Friday at 7 PM."

  2. CAObserverFlow runs:

    • _extractUserInformationDuringConversationSkill detects no phone number, no email, language is English.
    • _buildUserInformationSkill stores: user.language: English, all other fields empty.
    • Quality evaluation runs (phone channel) — scores 9 (coherent conversation).
  3. CAThoughtsFlow runs:

    • _getCompletePromptSkill assembles supervisor context with scenarios, procedures, and intent type map.
    • _genThoughtsSkill generates:
## FIELDS SUCCESSFULLY COLLECTED:

**User Language:** English
**Party Size:** 4 (gathered, reconfirmation is not required)
**Preferred Date:** This Friday (gathered, reconfirmation is not required)
**Preferred Time:** 7 PM (gathered, reconfirmation is not required)

## FIELDS TO BE COLLECTED:

**Phone Number:** not known
**Full Name:** not known

## KEY DIRECTIVES:

## Scenario 1: "Make a Regular Table Booking"

- 1.3: Follow the **Reconfirming or Gathering Phone Number** procedure.
    Since the phone number is not known, ask the user for their phone number.

- 1.4: Follow the **Gathering Full Name** procedure.

- 1.5: **CRITICAL STEP!!!** Say the special **code-phrase**:
    "Let me check availability for Friday at 7 PM"
  1. ConvoAgent uses these directives to ask for the phone number next, rather than jumping ahead or missing the step.

  2. On V2V channels, the thoughts are simultaneously pushed into the voice model via v2v_add_context, ensuring the spoken response follows the same directives.


Changelog

The Observer system: initial publication

Published developer reference covering the Observer's parallel reasoning architecture, CAObserverFlow and CAThoughtsFlow skills, user information extraction, thought generation pipeline with KEY DIRECTIVES format, field tracking states, conversation quality scoring with recovery actions, V2V integration, and waiting mode.