The Observer system
The Observer is a parallel reasoning system that runs alongside the ConvoAgent after each conversational turn. It acts as a "coach" that analyzes the conversation, extracts user information, evaluates quality, and generates structured directives (thoughts) to prevent hallucinations and missed steps. This article is the definitive developer reference for the Observer's architecture, skills, thought generation pipeline, field tracking, and quality scoring.
How the Observer works
After every conversational turn, the platform fires a broadcast_analyze_conversation event. Two flows subscribe to this event in parallel:
- CAObserverFlow — Extracts user information (phone, email, language, opt-in preferences) and evaluates conversation quality.
- CAThoughtsFlow — Generates structured "thoughts" (KEY DIRECTIVES) that guide the ConvoAgent's next response.
Together, these flows ensure the ConvoAgent always knows where it is in the conversation, what information has been collected, and what to do next.
User message
│
▼
ConvoAgent generates response
│
▼
broadcast_analyze_conversation event
│
├──► CAObserverFlow (queue mode)
│ ├── Extract user information
│ ├── Evaluate conversation quality
│ └── Determine working hours status
│
└──► CAThoughtsFlow (interrupt mode)
├── Assemble supervisor context
├── Generate KEY DIRECTIVES
└── Inject thoughts into V2V (if voice)
:::
🗒️ NOTE
CAObserverFlow subscribes to broadcast_analyze_conversation in queue mode (waits for current skill to finish), while CAThoughtsFlow subscribes in interrupt mode (interrupts previous thought generation with updated context). This ensures thoughts always reflect the latest conversation state.
:::
Observer flow architecture
The CAObserverFlow contains 10 skills that handle conversation analysis and user information extraction.
User information extraction
The _extractUserInformationDuringConversationSkill uses structured JSON generation to extract the following fields from each conversation turn:
| Field | Description |
|---|---|
user_name | Full name as spoken or typed by the user. |
user_email | Email address. |
user_preferred_language | Language detected from the user's last message. Defaults to the project's primary language. |
user_phone_number_with_country_code | Full phone number including country code. |
user_phone_number_country_code | Country code extracted via greedy matching against 200+ international codes. |
user_phone_number_without_country_code | Local phone number without the country code prefix. |
sms_opt_in | Defaults to true. Set to false only if the user explicitly declines SMS. |
email_opt_in | Defaults to true. Set to false only if the user explicitly declines email. |
Thought generation
The CAThoughtsFlow is the core reasoning engine. It assembles a comprehensive supervisor context and generates structured directives that tell the ConvoAgent exactly what to do next.
Supervisor context assembly
The _getCompletePromptSkill assembles the full supervisor prompt from multiple data sources:
Each section is injected by a dedicated skill:
| Section | Data source |
|---|---|
BusinessContext | Static business context + data injection |
ExplicitConstraints | Channel-specific rules (phone vs chat), anti-jailbreak, language consistency |
WorkingSchedule | Working days, hours, exceptions from customer attributes |
IntentTypeMap | Compiled intent type map (working hours vs non-working hours variant) |
ConvoAgentScenariosAndProcedures | Compiled scenarios and procedures |
AvailabilityForTheUserRequestedDateTime | Booking slots availability (if booking enabled) |
ActionsStates | Booking and calendar action states |
AdditionalInformation | RAG retrieval context |
Thought generation skill
The _genThoughtsSkill is the core of the Observer system. It takes the assembled supervisor context, conversation history, and previous thoughts, then generates a structured CurrentTaskAnalysis block.
The output follows a strict format with these sections:
FIELDS SUCCESSFULLY COLLECTED
Lists fields the agent was expected to collect that have been gathered and reconfirmed (if necessary). Always includes the user language.
**User Language:** English
**First Name:** David (gathered, reconfirmation is not required)
**Last Name:** Yang (gathered, reconfirmation is not required)
**Phone Number:** 12312312323 (reconfirmed)
FIELDS TO BE COLLECTED
Lists fields the agent was expected to collect but has not yet collected — possibly skipped or missed.
**Email:** not known
FIELDS TO BE RECONFIRMED
Lists fields where a value is known (including detected values from <UserInformation>) but requires explicit reconfirmation per the procedure.
**Phone number:** 12312312323 (to be confirmed according to **Step 1.1:
Reconfirming or Gathering Phone Number** procedure).
KEY DIRECTIVES
The most critical section. Contains the next three sequential steps the ConvoAgent must take, including scenario names, step numbers, procedure references, and code-phrases.
## Scenario 1: "Make a Regular Table Booking"
- 1.5: According to **Reconfirming or Gathering Preferred Date and Time**
procedure you must say the special **code-phrase**: "Let me check
availability for [requested day] at [requested time]"
- 1.6: According to **Reconfirming or Gathering Email Address** procedure,
since it's a phone channel and conversation does not have a valid email
string, you must tell following **code-phrase**: ...
- 1.7: ...
Mandatory language translation
The thought generation includes a critical global override rule: all code-phrases, quoted phrases, and template phrases must be fully translated into the user's detected language. If the user speaks Spanish, a code-phrase like "Let me check availability" must appear in the KEY DIRECTIVES as "Déjame verificar la disponibilidad." Emitting phrases in the wrong language is classified as a "catastrophic failure" in the system prompt.
Thought generation instruction rules
The _genThoughtsSkill enforces 11 rules for generating KEY DIRECTIVES. The most important beyond the basic step-tracking rules:
| Rule | Description |
|---|---|
| Three-step window | Always generate directives for the next three sequential uncompleted steps. If fewer than 3 remain, list all remaining. |
| Skip completed steps | If a step's field has already been collected, skip it and advance to the next step instead of including it in directives. |
| Procedure expansion | When a step references a procedure, study the procedure content and highlight the specific sub-step within it that the agent should perform next. |
| Branching scenarios | When a step describes branching, include a brief summary of the two most important branches (prioritize [L] lead scenarios from the IntentTypeMap). Include the first step and code-phrase of each referenced scenario. |
| Markers | Lines starting with #MARKER: in the conversation override normal scenarios. If a marker indicates an action was already done (e.g., "call was transferred"), skip that action. For <CallEndedCase> markers showing a new call, restart at Scenario 0. |
| Uninterruptible phases | Steps marked as Uninterruptible Scenario Phase mean the conversation topic cannot change. Include an example response for when the user deviates. |
| Notice blocks | Scenarios may include > [!TIP], > [!NOTE], > [!IMPORTANT] blocks. Include important information from these blocks in directives. |
| Background actions | Steps with background actions (e.g., "Tell: Give me a moment to check availability...") should stop directive generation at that step until the action result is visible. |
| SMS/Email section conditions | Steps with conditional branching based on <SMSInformationAllowedToBeSent> must resolve the condition and include only the matching branch. |
| Channel conditions | Steps with conversation.channel conditions (phone vs chat) must resolve to the current channel and include only the applicable branch. |
Prompt caching
The complete supervisor prompt is compiled once at the start of each session and cached for subsequent turns, avoiding redundant re-assembly within the same conversation. A fresh prompt is generated when a new session begins or if the cache is empty.
Field tracking states
Every field the agent needs to collect progresses through a state machine:
| State | Definition |
|---|---|
| not known | The field has an unknown, empty, or not-provided value. |
| detected | The value was taken from the <UserInformation> block, which collects information implicitly (e.g., caller ID phone number). |
| gathered | The value was explicitly collected from the <Conversation> block according to the corresponding scenario step. |
| reconfirmed | The value was additionally and explicitly reconfirmed in the conversation according to the corresponding procedure. |
A field is considered successfully collected when:
- It has been gathered and does not require reconfirmation, OR
- It has been gathered and reconfirmed (when the procedure title contains the Reconfirm keyword)
A field appears in FIELDS TO BE RECONFIRMED when:
- A value is known (detected or gathered) but the corresponding step's procedure name contains "Reconfirm" and the user has not yet explicitly confirmed the value
Conversation quality scoring
The _evaluateConversationQualitySkill rates ongoing conversation quality on a 0–10 scale and triggers recovery actions when quality is poor.
Quality indicators
The skill evaluates quality based on signs of a poor connection:
- User says "Hello?" multiple times
- User sends incoherent short messages
- User sends short grammatically incorrect sentences
- User sends short sentences without punctuation
- User sends messages that cannot be understood properly
Scoring guidelines
| Score range | Interpretation |
|---|---|
| 7–10 | Good quality. No action needed. |
| 4–6 | Moderate quality. No action triggered. |
| 0–3 | Poor quality. Recovery action triggered. |
::: ❗❗ IMPORTANT The quality evaluation applies leniency to early-stage conversations. A short or minimal conversation does not automatically receive a low rating unless there is clear evidence of poor quality. Improvements in later turns are weighted more heavily than initial issues. :::
Recovery actions
When a conversation scores 3 or below, the system triggers one of three recovery actions based on configuration:
| Condition | Action |
|---|---|
| Phone channel + call transfer enabled + business open | Transfer the call to a human co-worker with message: "Sorry, the line is breaking up... I will transfer the call to my co-worker." |
| Report enabled (no transfer available) | Offer to relay information to the manager: "Sorry, the line is breaking up... I can relay the information to our manager." |
| Fallback (no transfer, no report) | Suggest calling back: "Sorry, the line is breaking up... Do you mind calling back?" |
Once a low score is recorded, the flag agent_gave_low_conversation_quality_score is set to True on the persona, preventing repeated quality evaluations for the same session.
Observer and V2V integration
When the conversation is on a voice-to-voice channel, the generated thoughts are injected directly into the V2V model as context. This is the "backward prompting" mechanism.
After generating thoughts, _genThoughtsSkill checks if the session uses V2V and injects the thoughts. The thoughts are wrapped in a <CurrentTaskAnalysis> block with a warning that the instructions may be slightly outdated (by one or more turns), and should be used to align the ConvoAgent's current scope rather than as absolute commands.
Waiting mode
The CAThoughtsFlow supports a waiting mode that pauses thought generation while the system waits for an external action to complete (e.g., checking booking availability).
| Event | Skill | Effect |
|---|---|---|
convoagent_activate_waiting_mode | WaitingModeActivateSkill | Sets waiting_mode state to "activated". Thought generation is skipped. |
urgent_message | WaitingModeDisableSkill | Sets waiting_mode state to "disabled". Thought generation resumes. |
waiting_mode_fallback | WaitingModeDisableSkill | Timer-based fallback that disables waiting mode if the external action takes too long. |
Example workflow
Here is an end-to-end example of how the Observer processes a restaurant reservation conversation:
-
User says: "I would like to reserve a table for 4 people this Friday at 7 PM."
-
CAObserverFlow runs:
_extractUserInformationDuringConversationSkilldetects no phone number, no email, language is English._buildUserInformationSkillstores:user.language: English, all other fields empty.- Quality evaluation runs (phone channel) — scores 9 (coherent conversation).
-
CAThoughtsFlow runs:
_getCompletePromptSkillassembles supervisor context with scenarios, procedures, and intent type map._genThoughtsSkillgenerates:
## FIELDS SUCCESSFULLY COLLECTED:
**User Language:** English
**Party Size:** 4 (gathered, reconfirmation is not required)
**Preferred Date:** This Friday (gathered, reconfirmation is not required)
**Preferred Time:** 7 PM (gathered, reconfirmation is not required)
## FIELDS TO BE COLLECTED:
**Phone Number:** not known
**Full Name:** not known
## KEY DIRECTIVES:
## Scenario 1: "Make a Regular Table Booking"
- 1.3: Follow the **Reconfirming or Gathering Phone Number** procedure.
Since the phone number is not known, ask the user for their phone number.
- 1.4: Follow the **Gathering Full Name** procedure.
- 1.5: **CRITICAL STEP!!!** Say the special **code-phrase**:
"Let me check availability for Friday at 7 PM"
-
ConvoAgent uses these directives to ask for the phone number next, rather than jumping ahead or missing the step.
-
On V2V channels, the thoughts are simultaneously pushed into the voice model via
v2v_add_context, ensuring the spoken response follows the same directives.
Changelog
The Observer system: initial publication
Published developer reference covering the Observer's parallel reasoning architecture, CAObserverFlow and CAThoughtsFlow skills, user information extraction, thought generation pipeline with KEY DIRECTIVES format, field tracking states, conversation quality scoring with recovery actions, V2V integration, and waiting mode.
Updated 2 days ago
