Towards a multi-agent AI future
A few highlights from GPT4o announcement today and what I think this means for the future of software
GPT4 Omni was announced today and I wanted to share a few highlights.
I’ve sliced together the 3 most impressive demo videos together for your viewing connivence (below) but you can watch all the demos in full on the official page.
We already knew multi-modal input and output was coming (Google’s Gemini was the first large model to demonstrate some of this) but what was surprising is the near instant response time, the AI voice expressiveness and thus ability to communicate with other AI voices and angels.
What I found impressive:
Two GPT4o agents singing a song about what they just ‘saw’ in the scene, alternating the lines
GPT4o telling a bed time story in different voices – the expressiveness of the voice model is impressive and makes the whole interface ‘feel like a move’
Being a Maths Tutor through screen share – you can see GPT4o ability to be constantly listening, interrupted and guiding is now possible with the fast response times
What I think this means:
Its clear that we’re heading to a multi-agent/AI future, where ‘teams of models’ will interact with each other to help you complete a task. I don’t think this will be a case of all models being powered by one frontier LLM but it could be. Clearly, they’ll have different abilities and probably be fine-tuned to different tasks.
Having natural ‘always on’ conversations with AI models may become the normal ‘interface’ for a lot of consumer or indvidual applications. For businesses I’m still not sure this is how it will play out – the product category for this ‘agent ops’ is still evolving. Obviously conversation is not the right interface for a lot of tasks too!
Modes of interaction with software will now be ‘involving AI’ wherever and whenever you are (check out the London Demo) – The desktop PC put software in a home, the smart phone put software in your pocket/hand and AI/LLMs I think will put software in ‘whatever you do’
Few other aspects means the extra speed really has an impact in coding and analysis too (see video here)
Contrarian opinion(?): The American female voice really does feel like trying a bit too much. I saw this tweet below and it reminded me of how much this feels like replicating the AI character in the movie Her (great Movie)…which I’m not sure is a good thing.
Either way we need an desperately need a sarcastic Aussie persona 🦘