The Observer system

The Observer is a parallel reasoning system that runs alongside the ConvoAgent after each conversational turn. It acts as a "coach" that analyzes the conversation, extracts user information, evaluates quality, and generates structured directives (thoughts) to prevent hallucinations and missed steps. This article is the definitive developer reference for the Observer's architecture, skills, thought generation pipeline, field tracking, and quality scoring.

How the Observer works

After every conversational turn, the platform fires a broadcast_analyze_conversation event. Two flows subscribe to this event in parallel:

CAObserverFlow — Extracts user information (phone, email, language, opt-in preferences) and evaluates conversation quality.
CAThoughtsFlow — Generates structured "thoughts" (KEY DIRECTIVES) that guide the ConvoAgent's next response.

Together, these flows ensure the ConvoAgent always knows where it is in the conversation, what information has been collected, and what to do next.

User message
    │
    ▼
ConvoAgent generates response
    │
    ▼
broadcast_analyze_conversation event
    │
    ├──► CAObserverFlow (queue mode)
    │       ├── Extract user information
    │       ├── Evaluate conversation quality
    │       └── Determine working hours status
    │
    └──► CAThoughtsFlow (interrupt mode)
            ├── Assemble supervisor context
            ├── Generate KEY DIRECTIVES
            └── Inject thoughts into V2V (if voice)

:::
🗒️ NOTE
CAObserverFlow subscribes to broadcast_analyze_conversation in queue mode (waits for current skill to finish), while CAThoughtsFlow subscribes in interrupt mode (interrupts previous thought generation with updated context). This ensures thoughts always reflect the latest conversation state.
:::

Observer flow architecture

The CAObserverFlow contains 10 skills that handle conversation analysis and user information extraction.

User information extraction

The _extractUserInformationDuringConversationSkill uses structured JSON generation to extract the following fields from each conversation turn:

Field	Description
`user_name`	Full name as spoken or typed by the user.
`user_email`	Email address.
`user_preferred_language`	Language detected from the user's last message. Defaults to the project's primary language.
`user_phone_number_with_country_code`	Full phone number including country code.
`user_phone_number_country_code`	Country code extracted via greedy matching against 200+ international codes.
`user_phone_number_without_country_code`	Local phone number without the country code prefix.
`sms_opt_in`	Defaults to `true`. Set to `false` only if the user explicitly declines SMS.
`email_opt_in`	Defaults to `true`. Set to `false` only if the user explicitly declines email.

Thought generation

The CAThoughtsFlow is the core reasoning engine. It assembles a comprehensive supervisor context and generates structured directives that tell the ConvoAgent exactly what to do next.

Supervisor context assembly

The _getCompletePromptSkill assembles the full supervisor prompt from multiple data sources:

Each section is injected by a dedicated skill:

Section	Data source
`BusinessContext`	Static business context + data injection
`ExplicitConstraints`	Channel-specific rules (phone vs chat), anti-jailbreak, language consistency
`WorkingSchedule`	Working days, hours, exceptions from customer attributes
`IntentTypeMap`	Compiled intent type map (working hours vs non-working hours variant)
`ConvoAgentScenariosAndProcedures`	Compiled scenarios and procedures
`AvailabilityForTheUserRequestedDateTime`	Booking slots availability (if booking enabled)
`ActionsStates`	Booking and calendar action states
`AdditionalInformation`	RAG retrieval context

Thought generation skill

The _genThoughtsSkill is the core of the Observer system. It takes the assembled supervisor context, conversation history, and previous thoughts, then generates a structured CurrentTaskAnalysis block.

The output follows a strict format with these sections:

FIELDS SUCCESSFULLY COLLECTED

Lists fields the agent was expected to collect that have been gathered and reconfirmed (if necessary). Always includes the user language.

**User Language:** English
**First Name:** David (gathered, reconfirmation is not required)
**Last Name:** Yang (gathered, reconfirmation is not required)
**Phone Number:** 12312312323 (reconfirmed)

FIELDS TO BE COLLECTED

Lists fields the agent was expected to collect but has not yet collected — possibly skipped or missed.

**Email:** not known

FIELDS TO BE RECONFIRMED

Lists fields where a value is known (including detected values from <UserInformation>) but requires explicit reconfirmation per the procedure.

**Phone number:** 12312312323 (to be confirmed according to **Step 1.1:
    Reconfirming or Gathering Phone Number** procedure).

KEY DIRECTIVES

The most critical section. Contains the next three sequential steps the ConvoAgent must take, including scenario names, step numbers, procedure references, and code-phrases.

## Scenario 1: "Make a Regular Table Booking"

- 1.5: According to **Reconfirming or Gathering Preferred Date and Time**
    procedure you must say the special **code-phrase**: "Let me check
    availability for [requested day] at [requested time]"

- 1.6: According to **Reconfirming or Gathering Email Address** procedure,
    since it's a phone channel and conversation does not have a valid email
    string, you must tell following **code-phrase**: ...

- 1.7: ...

Mandatory language translation

The thought generation includes a critical global override rule: all code-phrases, quoted phrases, and template phrases must be fully translated into the user's detected language. If the user speaks Spanish, a code-phrase like "Let me check availability" must appear in the KEY DIRECTIVES as "Déjame verificar la disponibilidad." Emitting phrases in the wrong language is classified as a "catastrophic failure" in the system prompt.

Thought generation instruction rules

The _genThoughtsSkill enforces 11 rules for generating KEY DIRECTIVES. The most important beyond the basic step-tracking rules:

Rule	Description
Three-step window	Always generate directives for the next three sequential uncompleted steps. If fewer than 3 remain, list all remaining.
Skip completed steps	If a step's field has already been collected, skip it and advance to the next step instead of including it in directives.
Procedure expansion	When a step references a procedure, study the procedure content and highlight the specific sub-step within it that the agent should perform next.
Branching scenarios	When a step describes branching, include a brief summary of the two most important branches (prioritize `[L]` lead scenarios from the IntentTypeMap). Include the first step and code-phrase of each referenced scenario.
Markers	Lines starting with `#MARKER:` in the conversation override normal scenarios. If a marker indicates an action was already done (e.g., "call was transferred"), skip that action. For `<CallEndedCase>` markers showing a new call, restart at Scenario 0.
Uninterruptible phases	Steps marked as Uninterruptible Scenario Phase mean the conversation topic cannot change. Include an example response for when the user deviates.
Notice blocks	Scenarios may include `> [!TIP]`, `> [!NOTE]`, `> [!IMPORTANT]` blocks. Include important information from these blocks in directives.
Background actions	Steps with background actions (e.g., "Tell: Give me a moment to check availability...") should stop directive generation at that step until the action result is visible.
SMS/Email section conditions	Steps with conditional branching based on `<SMSInformationAllowedToBeSent>` must resolve the condition and include only the matching branch.
Channel conditions	Steps with `conversation.channel` conditions (phone vs chat) must resolve to the current channel and include only the applicable branch.

Prompt caching

The complete supervisor prompt is compiled once at the start of each session and cached for subsequent turns, avoiding redundant re-assembly within the same conversation. A fresh prompt is generated when a new session begins or if the cache is empty.

Field tracking states

Every field the agent needs to collect progresses through a state machine:

State	Definition
not known	The field has an unknown, empty, or not-provided value.
detected	The value was taken from the `<UserInformation>` block, which collects information implicitly (e.g., caller ID phone number).
gathered	The value was explicitly collected from the `<Conversation>` block according to the corresponding scenario step.
reconfirmed	The value was additionally and explicitly reconfirmed in the conversation according to the corresponding procedure.

A field is considered successfully collected when:

It has been gathered and does not require reconfirmation, OR
It has been gathered and reconfirmed (when the procedure title contains the Reconfirm keyword)

A field appears in FIELDS TO BE RECONFIRMED when:

A value is known (detected or gathered) but the corresponding step's procedure name contains "Reconfirm" and the user has not yet explicitly confirmed the value

Conversation quality scoring

The _evaluateConversationQualitySkill rates ongoing conversation quality on a 0–10 scale and triggers recovery actions when quality is poor.

Quality indicators

The skill evaluates quality based on signs of a poor connection:

User says "Hello?" multiple times
User sends incoherent short messages
User sends short grammatically incorrect sentences
User sends short sentences without punctuation
User sends messages that cannot be understood properly

Scoring guidelines

Score range	Interpretation
7–10	Good quality. No action needed.
4–6	Moderate quality. No action triggered.
0–3	Poor quality. Recovery action triggered.

:::
❗❗ IMPORTANT
The quality evaluation applies leniency to early-stage conversations. A short or minimal conversation does not automatically receive a low rating unless there is clear evidence of poor quality. Improvements in later turns are weighted more heavily than initial issues.
:::

Recovery actions

When a conversation scores 3 or below, the system triggers one of three recovery actions based on configuration:

Condition	Action
Phone channel + call transfer enabled + business open	Transfer the call to a human co-worker with message: "Sorry, the line is breaking up... I will transfer the call to my co-worker."
Report enabled (no transfer available)	Offer to relay information to the manager: "Sorry, the line is breaking up... I can relay the information to our manager."
Fallback (no transfer, no report)	Suggest calling back: "Sorry, the line is breaking up... Do you mind calling back?"

Once a low score is recorded, the flag agent_gave_low_conversation_quality_score is set to True on the persona, preventing repeated quality evaluations for the same session.

Observer and V2V integration

When the conversation is on a voice-to-voice channel, the generated thoughts are injected directly into the V2V model as context. This is the "backward prompting" mechanism.

After generating thoughts, _genThoughtsSkill checks if the session uses V2V and injects the thoughts. The thoughts are wrapped in a <CurrentTaskAnalysis> block with a warning that the instructions may be slightly outdated (by one or more turns), and should be used to align the ConvoAgent's current scope rather than as absolute commands.

Waiting mode

The CAThoughtsFlow supports a waiting mode that pauses thought generation while the system waits for an external action to complete (e.g., checking booking availability).

Event	Skill	Effect
`convoagent_activate_waiting_mode`	`WaitingModeActivateSkill`	Sets `waiting_mode` state to `"activated"`. Thought generation is skipped.
`urgent_message`	`WaitingModeDisableSkill`	Sets `waiting_mode` state to `"disabled"`. Thought generation resumes.
`waiting_mode_fallback`	`WaitingModeDisableSkill`	Timer-based fallback that disables waiting mode if the external action takes too long.

Example workflow

Here is an end-to-end example of how the Observer processes a restaurant reservation conversation:

User says: "I would like to reserve a table for 4 people this Friday at 7 PM."
CAObserverFlow runs:
- _extractUserInformationDuringConversationSkill detects no phone number, no email, language is English.
- _buildUserInformationSkill stores: user.language: English, all other fields empty.
- Quality evaluation runs (phone channel) — scores 9 (coherent conversation).
CAThoughtsFlow runs:
- _getCompletePromptSkill assembles supervisor context with scenarios, procedures, and intent type map.
- _genThoughtsSkill generates:

## FIELDS SUCCESSFULLY COLLECTED:

**User Language:** English
**Party Size:** 4 (gathered, reconfirmation is not required)
**Preferred Date:** This Friday (gathered, reconfirmation is not required)
**Preferred Time:** 7 PM (gathered, reconfirmation is not required)

## FIELDS TO BE COLLECTED:

**Phone Number:** not known
**Full Name:** not known

## KEY DIRECTIVES:

## Scenario 1: "Make a Regular Table Booking"

- 1.3: Follow the **Reconfirming or Gathering Phone Number** procedure.
    Since the phone number is not known, ask the user for their phone number.

- 1.4: Follow the **Gathering Full Name** procedure.

- 1.5: **CRITICAL STEP!!!** Say the special **code-phrase**:
    "Let me check availability for Friday at 7 PM"

ConvoAgent uses these directives to ask for the phone number next, rather than jumping ahead or missing the step.
On V2V channels, the thoughts are simultaneously pushed into the voice model via v2v_add_context, ensuring the spoken response follows the same directives.