The latest wave of AI tech has uncanny abilities to interpret and emulate human emotions.
Large language models (LLMs) have significantly improved in generating video and audio avatars with impressive realism and liveliness. These computer-generated figures appear in social media feeds and while browsing the internet. As AI technology advances it will become common to interact with content that mimics human appearance and emotion.
Major technology firms are investing billions of dollars into affective AI solutions aimed at interacting with the public, interpreting their emotional state, and responding empathetically to solve real-world problems. A challenge in releasing AI agents and applications that mimic human emotions will be to ensure they don’t trigger the anxiety and stress caused by the uncanny valley phenomonon.
The Uncanny Valley
The eerie gap between what is “nearly human” and “fully human” is what futurist Masahiro Mori called the Uncanny Valley.
In a 1970 essay on robots, Masahiro Mori hypothesized that when objects look more lifelike, your emotional responses are positive and empathetic. This remains true until things appear almost lifelike, with details that closely resemble human qualities, where you begin to feel strong, revolting disgust. However, once things appear indistinguishably human, veering out of the uncanny valley, your feelings become positive again and you start to empathize with artificial things.
I recently walked past an animatronic disembodied head in Seoul, South Korea airport (2024).
The robotic art installation I captured in the video above is dubbed “The Giant,” an 8 ft. tall face mounted at the retail entrance of eyewear brand Gentle Monster. The hyper-realistic robot stops travelers in their tracks during their layover and evokes the uncanny valley effect with a lifelike yet emotionless appearance.
We have all experienced unsettling feelings that stem from the uncanny valley. You’ve confronted uncomfortable humanoid characters while watching CGI movies, playing video games, or passing by animatronic animals on a theme park ride.
For decades offputting, almost-human encounters were partitioned in the world of entertainment and media. You were unlikely to cross paths with realistic, simulated humans in your day-to-day life, but that is starting to change with rapid advances in artificial intelligence.
Widespread Impact of Conversational AI
An unsettling blur is beginning to appear between the lines of artificial and authentic content. Increasing realism of AI-generated video and audio will surely soon cause widespread effects at the societal level.
Psychologists and engineers have teamed up in a field called affective computing where they’re training models with emotional AI to simulate human response. As a result, computers are getting better at detecting and understanding your emotions.
2024 is a major presidential election year and generative deep fakes will certainly force media outlets to question authenticity of content which influences voter perceptions. I’m worried that a certain candidate will soon point to actual interview footage of himself and label it deep-fake-news. This will call video evidence into question in a variety of legal settings, as courts soon need to be forensically equipped to evaluate video source files for production techniques and authenticity.
Online dating will likely get disrupted by realistic, artificial profiles which call dynamics of interpersonal relationships and trust into question.
There will also be a great deal of potential good created by generative conversational systems. In eLearning, synthetic professors and instructors could dramatically enhance the educational experience. Imagine learning at your own pace, in your style by exploring personalized curriculum 1:1 with an expert. In customer service, AI agents will improve interactions by understanding your intent and sentiment, then responding based on brand policy.
Flirting with Emotions: GPT-4 Omni by OpenAI
In May 2024, OpenAI demonstrated an upgraded ChatGPT in a video announcement introducing their GPT-4o model (often called “GPT-4 Omni”). Their impressive Spring Update livestream included a demo that emphasized the new Omni model’s emotional expressiveness and rapid, almost flirty, conversational abilities.
In the video below, GPT-4o runs on a smartphone and interacts like a friend on a FaceTime video call. Omni engages in natural voice and listens for emotional tones, responding with simulated emotional reactions such as playfully blushing or greeting cheerfully.
This new emotionally intelligent chatbot model can interact with the world through audio, vision, and text.
The mode for interacting AI is expanding out from text prompts in a simple chatbot interface to include omnichannel input like AI vision and voice recognition.
Users joke in the Youtube comments that, the movie ’Her’ is now becoming reality.
As LLM interfaces incorporate more dynamic and personalized dialogue into their interactions, conversations begin to feel more human-like. While emotional awareness capabilities do make AI assistants more engaging, they have capacity to pose serious psychological risks.
Talking Pictures: Microsoft’s VASA-1
Microsoft’s VASA-1 model (above) is a framework developed to generate realistic talking face videos when given a single static image and a speech audio clip.
Video avatars generated by this research project can mimic a wide range of facial expressions and movements that matches provided audio.
Empathetic AI: Hume Understands Your Voice
Hume AI just raised $50MM funding round to support their flagship product, an emotionally intelligent voice interface that can be built into any application.
Try it for yourself – speak to Hume AI: https://demo.hume.ai
Hume’s AI displays its emotional calculations on screen, indicating what it’s reading in your voice. Their Empathetic Voice Interface (EVI) measures hundreds of dimensions of expression by interpreting vocal tones, speech patterns, and facial expressions.
Tomorrow’s Emotive Tech
Last year’s AI models like GPT-4 and Google’s Gemini are great at understanding text, but they lack the ability to process nonverbal cues embedded in speech patterns or facial expressions. These subtleties are crucial for effective communication.
Affective AI technologies are getting better at reading emotions from voices and facial expressions, and the field is poised to make significant strides in emotional intelligence which could soon revolutionize human-AI interaction.
As emotionally intelligent chatbots enhance their persuasive abilities, it’s crucial to consider cultural risks of technological manipulation and addiction these systems could introduce.