OpenAI employees and acolytes were all over Twitter in the lead-up to Monday’s spring update from the company, so much so that the breathless hype was practically unavoidable. “The world will change forever,” one tweet promised. OpenAI CEO Sam Altman teased that what was coming felt like “magic” to him. And this was all on top of rumors that the ChatGPT maker is working on a Google Search rival, and that OpenAI is teaming up with Apple for a voice assistant.

In reality, while there were a plethora of technical announcements that the company unveiled during its rather brisk live-streamed event (such as the release of a new desktop version of ChatGPT), one over-arching big reveal stood out to me. It’s that the new ChatGPT-4o makes OpenAI’s already impressive chatbot feel and sound so much more, dare I say it, human.

Among other things, ChatGPT can now detect emotion in both the user’s voice as well as from their facial expression, just like a human can. It also makes unprompted jokes, the way a human would who’s trying to keep a conversation light, and it also lets you interrupt a response — so that you no longer have to confine yourself to the stilted my turn-your turn dynamic of a conversation with a chatbot.

I’m blown away by GPT-4o.

Realtime + multimodal + desktop app.

You’ll have an AI teammate on your device that’s able to help you with anything you’re working on – and it runs 2x faster and costs 50% less than before.

OpenAI doesn’t make AI models.

They make magic.

— Mckay Wrigley (@mckaywrigley) May 13, 2024

To get a sense of what I mean about OpenAI making ChatGPT feel more human, check out this video the company posted in which the new GPT4-o model interacts via the camera with a cute dog. If you had your eyes closed, you’d think this is a real lady fawning over a cute puppy, when in fact it’s an AI model that’s learned how to express relevant and appropriate emotion — in addition to making the same observations we would when we meet a cute dog for the first time.


“GPT-4o (‘o’ for ‘omni’) is a step towards much more natural human-computer interaction,” OpenAI explains about the update. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation … GPT-4o is especially better at vision and audio understanding compared to existing models.”

That last part really speaks to the magic I was alluding to above. During the event on Monday, for example, ChatGPT read a bedtime story (with plenty of whimsy, emotion, and drama added to the narration). In a conversation, it repeats thoughts back to the user for purposes of clarity, and adds hmmms and pauses, just like a human.

[ For more curated tech news, check out the main news page here]

The post OpenAI’s big reveal: Making ChatGPT feel and sound so much more human first appeared on

New reasons to get excited everyday.

Get the latest tech news delivered right in your mailbox

You may also like

Notify of
Inline Feedbacks
View all comments

More in computing