How Does Gemini Compete With OpenAI? He Explains What OpenAI Could Have Learned and What It Could Teach Us About Coding, Audio, and Images
With the recent concern about the threat of developments from OpenAI and others, the company has worked harder to develop and launch a project with striking speed.
In those benchmarks (which really are mostly very close) Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. Multi-modality has been part of the plan from the beginning. Google hasn’t trained separate models for images and voice, the way OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning. “We’ve always been interested in very, very general systems,” Hassabis says. He wants to mix all of the modes so that he can collect as much data as possible and then give his responses with just as much variety.
Right now, Gemini’s most basic models are text in and text out, but more powerful models like Gemini Ultra can work with images, video, and audio. It is going to get even more general, says Hassabis. There are still things like touch and action. Over time, he says, Gemini will get more senses, become more aware, and become more accurate and grounded in the process. These models have biases, of course, and they still have problems, but they are hallucinating. Hassabis says they will get better if they know more.
Benchmarks are just benchmarks, though, and ultimately, the true test of Gemini’s capability will come from everyday users who want to use it to brainstorm ideas, look up information, write code, and much more. Google seems to see coding in particular as a killer app for Gemini; it uses a new code-generating system called AlphaCode 2 that it says performs better than 85 percent of coding competition participants, up from 50 percent for the original AlphaCode. But Pichai says that users will notice an improvement in just about everything the model touches.
There have been big leaps in artificial intelligence claimed by Mehmet Hassabis. In 2016 he became famous after the AlphaGo bot taught itself to play the board game Go with skill and ingenuity.
GPT-4’s multimodal chatbot: Why there is a lot more than one way to teach the physics of complex systems in OpenAI?
Most models have a similar approximated multimodality by training separate modules and stitching them together, but this appeared to be a veiled reference to OpenAI. That’s okay for some tasks, but you can’t have deep complex reasoning in the space.
In September, OpenAI launched an update to its chatbot, which allowed it to take images and audio in addition to text. The technical basis of GPT-4’s capabilities have not been disclosed.