Why Does Newo.ai Emphasize “Presence in the Physical World”?
Working with physical world interfaces such as talking head kiosks, smart speakers, and robots like Moxie fundamentally involves several user identification and segregation challenges.
Multiple individuals, for example, Jennifer, Michael, and Steven, may approach and address a single kiosk sequentially or simultaneously. The agent must be able to identify each user individually and link them to their respective histories.
Solving this problem includes:
- Speaker separation, which involves distinguishing the voice of a specific speaker from signals in a multi-speaker public space through methods like voice IDs, face IDs, and control words. This requires integrating face recognition and voice prints with user database management.
- End-of-speech prediction/detection, which is a significant challenge in noisy public environments.
- Predicting the last words for early Speech-to-Text (STT) submission to achieve agent response times of less than 500 milliseconds.
Updated 9 months ago