GPT-4 is OpenAI’s current model, and powers the paid version of ChatGPT. GPT-4o is an update to that model: it offers GPT-4 level capabilities, according to OpenAI CTO Mira Murati, but is much faster.

Murati said the new version does have a few other improvements over the last model, including a “huge step forward” in ease of use. “This is incredibly important because we are looking at the future of interaction between ourselves and the machines,” she said.

Essentially, this comes down to smoother, faster voice interactions with ChatGPT.

That’s made possible by ditching separate models for transcription, understanding the text, then converting a response back to speech. Instead, human-to-bot voice conversations are handled natively within the main GPT-4o model. That should mean no lag between speaking and receiving a response, plus it’s possible to interrupt the bot and it enables a more emotive way of speaking from the bot.

Demo of GPT-4o

In a demo, research scientist Mark Chen asked ChatGPT for advice on how to calm down when public speaking. A chirpy, female-sounding voice responded: “You’re doing a live demo, right now? That is awesome. Just take a deep breath and remember you are the expert here.” 

Murati added that the model includes improvements in understanding text, images and audio too, though not all those capabilities will be immediately available. In a demo, the bot was shown an equation on a piece of paper and verbally walked through how to solve it; later, after being shown a note reading “I [heart] ChatGPT”, the bot awkwardly flirted with Head of Post-Training Barret Zoph, admiring his outfit.

Other uses of GPT-40

OpenAI suggested plenty of other use cases beyond maths and romance.

“For example, you can now take a picture of a menu in a different language and talk to GPT-4o to translate it, learn about the food’s history and significance, and get recommendations,” the company said in a blog post.

“In the future, improvements will allow for more natural, real-time voice conversation and the ability to converse with ChatGPT via real-time video. For example, you could show ChatGPT a live sports game and ask it to explain the rules to you.”

However, the more advanced Voice Mode functions remain in development and will be released in an alpha in a few weeks, which Plus subscribers will be able to access early.

Related reading: What is an API?

How much does GPT-4o cost?

GPT-4o’s features will be available first in paid-for Chat-GPT Plus, but OpenAI said it will also roll out the new capabilities to the free edition.

The text and image tools will start to roll out in ChatGPT immediately, though the voice and tools will be limited to the paid-for version in that alpha at first. For developers, GPT-4o will also be made available via the API to create AI-powered applications.

Beyond the model upgrade, the free edition of ChatGPT will also get access to the GPT Store, the ability to upload files and analyse images, use the web browsing tool for up-to-date information, and use the Memory tool that can remember conversations for better context.

Why give it away for free? So we can all see how great it is, Murati says. “As you can see, this feels so magical,” said Murati. “It is wonderful but we also want to remove some of the mysticism from the technology and bring it to you so you can try it for yourself.”

Nicole Kobie
Nicole Kobie

Nicole is a journalist and author who specialises in the future of technology and transport. Her first book is called Green Energy, and she's working on her second, a history of technology. At TechFinitive she frequently writes about innovation and how technology can foster better collaboration.

NEXT UP