The next-gen computer model from the internet giant is already almost ready
From Bard to Google: Extending the Context Window to a Time-Dependent Way for AI to Embodied Humans
The success of ChatGPT has kicked off a furious artificial intelligence race. Earlier this week it was announced that OpenAI is giving something to people. the ability to remember useful information from conversations over long periods of time. Bard was renamed and announced that it would have a paid subscription.
As he’s explaining this to me, Pichai notes offhandedly that you can fit the entire Lord of The Rings trilogy into that context window. I asked him if it had already happened, since it seems too specific. Someone in Google is just checking to see if Gemini spots any continuity errors, trying to understand the complicated lineage of Middle-earth, and seeing if maybe AI can finally make sense of Tom Bombadil. “I’m sure it has happened,” Pichai says with a laugh, “or will happen — one of the two.”
The larger context window will be useful for businesses. “This allows use cases where you can add a lot of personal context and information at the moment of the query,” he says. Think of it like we’ve expanded the query window. He imagines filmmakers might upload their entire movie and ask Gemini what reviewers might say; he sees companies using Gemini to look over masses of financial records. He believes that it is one of the more significant things we have accomplished.
Eventually, Pichai says that the 1.0s and 1.5s and Pros and Ultras and corporate battles won’t matter to users. He says people will just be consuming the experiences. It is the same as using a smartphone without paying attention to the processor underneath. But at this moment, he says, we’re still in the phase where everyone knows the chip inside their phone, because it matters. “The underlying technology is shifting so fast,” he says. People care.
Demis Hassabis, CEO of Google DeepMind, which developed the new model, compares its vast capacity for input to a person’s working memory, something he explored years ago as a neuroscientist. “The great thing about these core capabilities is that they unlock sort of ancillary things that the model can do,” he says.
In a demo, Deep Mind showed a tool that could analyze a PDF of an Apollo 11 transcript. When the astronauts said a communications delay was due to a sandwich break, the model highlighted several moments. A model answered questions about actions in a movie. The previous version of Gemini could have answered these questions only for much shorter amounts of text or video. Google hopes that the new capabilities will allow developers to build new kinds of apps on top of the model.
Gemini Pro 1.5 is also more capable—at least for its size—as measured by the model’s score on several popular benchmarks. The new model exploits a technique previously invented by Google researchers to squeeze out more performance without requiring more computing power. The technique, called mixture of experts, taps into parts of a model that are best suited to solve a specific task and make it more efficient to train and run.
Despite being a significantly smaller version of the most powerful tool in the company’s arsenal, Gemini Pro 1.5 is still capable of many tasks, according to the search engine giant. Hassabis says there is no reason why the same technique used to improve Gemini Pro cannot be applied to boost Gemini Ultra.
The frenetic pace of progress in generative AI is at odds with worries about the risks the technology might pose. Google says it has put Gemini Pro 1.5 through extensive testing and that providing limited access offers a way to gather feedback on potential risks. The company says it has also provided researchers at the UK’s AI Safety Institute with access to its most powerful models so that they can test them.