News & Analysis

OpenAI’s GPT-4o: Smarter than Gemini?

Of course, we would know only once it rolls in and Google responds at its I/O conclave

Image Credit: livemint.com

Opinion is mostly divided on whether the first impression is actually the best. That’s why we took a few extra hours before coming out with our take on OpenAI’s updated GPT-4 model that powers the ChatGPT. Called GPT-4o, the model claims to be faster with abilities to enhance its prowess across text, vision and audio. 

That’s what OpenAI CTO Mira Murati said during the livestream announcement on Monday to a largely US audience. The company later noted in a blog post that it would be free for all users with paid users (as always) having up to five times the capacity limits of free users. The demo showed GPT-4o helping with a math problem and even indulging in some flirting.

Thereafter, it was the turn of CEO Sam Altman who posted on social media that the model was “natively multimodal” – in other words it could generate content or understand commands via voice, text or images. He added on X that developers can access GPT-4o API which is priced at half the price of GPT-4 Turbo but twice as fast.  

What are some of the new features?

Quite obviously the first and foremost would be the app’s ability to act as a Her-line voice assistant who can respond in real time and observe the world around one. Users would have observed that the current model is limited and responds to one prompt at a time while working only on what it can hear. 

Just to clarify further, OpenAI’s previous model – GPT4 Turbo – was actually trained on a combination of images and text which meant that it could analyze images and text and fulfil tasks such as extracting text from images or describing the content on those pages. The latest innovation adds speech into this mix. 

How’s GPT-4o different from GPT Turbo?

In case you aren’t immediately sure of how this would make a difference to your lives, here are a few samples. For starters, ChatGPT always had a voice mode that was transcribing the chatbot’s responses using a text-to-speech model, which was cumbersome. Now, the entire process is turbocharged that allows users to interact with ChatGPT as an assistant. 

What’s more, users can actually interrupt ChatGPT when it is answering a question and push it towards another query. OpenAI actually claims that such “real time responsiveness” allows it to even pick out nuances in a user’s voice while generating voices that covers a range of different emoting styles, including singing. 

GPT-4o has also upgraded the chatbot’s visual capabilities whereby when given a picture or a desktop screen, it can quickly answer related queries. Topics ranging from what’s going on in this software code to the brand of apparel a person is wearing also gets answered. Now, this is one feature that could find major use in driving digital retail sales. 

Finally, the new model also offers more multilingual support with enhanced performance in about 50 languages. Though at present, voice won’t be part of the GPT-4o API for all customers, OpenAI says a risk of misuse is what has made it launch voice support only to a small group of trusted partners in the coming weeks. 

Does all of this give ChatGPT a lift over Gemini?

Which now brings us to the question about the arms race that is on between the Microsoft-backed OpenAI and Google around the generative artificial intelligence (GenAI) prowess of each of them. As we write this, Google is readying for its annual I/O developers conference and one needn’t be surprised if the company seeks to add a bit of lustre to Gemini. 

For the moment, we believe that in this highly competitive market, OpenAI could be a notch ahead of Google though its GPT-4o appears to only match up to what Google has displayed with Gemini last December. Of course, what happened thereafter were a series of mishaps such as ones related to images that were racial and reverse racial.

In fact, some analysts hold the view that progress on the GenAI front has constantly inspired because of the new use cases that model development brings forth. However, since the first time the world saw ChatGPT (back in November 2022) till date, the progress has been at best at a gentle canter and not a gallop as was expected. 

Live Streaming bloopers, Altman’s policy shift

It wasn’t as though the GPT-4o was without its glitches. However, they did not shock the audiences. At best it was just a smirk as at one point the app mistook a smiling man as a wooden surface and at another time it was seen solving an equation that it hadn’t yet been shown. Remember the Google escapade with Gemini? 

And here’s where Altman did well to reflect on OpenAI’s journey via a blog post immediately after the livestream event. He said the company originally planned to create all sorts of benefits for the world but has since shifted to creating AI and allowing other people to use it to create all sorts of amazing things that will benefit them. Decentralization anyone?? 

Not to forget that Altman had told academics and students recently that it would be the hardware that would direct the future of AI with AI-led chips leading the way. For now, it looks like GPT-4o is just an upgrade that allows others to use it to their advantage in use cases that they feel best fits their lines of business. Will Google take a quantum leap?