Google introduced Bard to the world earlier this year and has been competing with the likes of OpenAI (for their ChataGPT) ever since. Now, Google is introducing a new major update to Bard, boosting the technology behind it. Google says that Bard is now powered by Gemini, its most capable and general model yet. What is it exactly and how does it work? Here’s a full explainer.
What is Gemini?
Gemini is Google’s own AI model that will power its AI products, including Bard. The first version of the model is called Gemini 1.0, and it is optimised for three different sizes, including Ultra, Pro and Nano. “Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalise and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video”, read a blog post from Google.
Gemini can run on everything from data centres to mobile devices efficiently. Its capabilities are to significantly enhance the way developers and enterprise customers build and scale with AI. If we talk of the differences between the three sizes of Gemini, the Gemini Ultra is Google’s largest and most capable model for highly complex tasks. The pro version is the best model for scaling across a wide range of tasks, while the Nano model is the most efficient model for on-device tasks. Where are these being used? We’ll come to that in a second.
Gemini has been designed to be natively multimodal, pre-trained from the start on different modalities. Then, it is fine-tuned with additional multimodal data to refine its effectiveness further. “This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain”, said Demis Hassabis, CEO and Co-Founder of Google DeepMind.
How does Gemini fare against ChatGPT?
Google claims that from natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.
It has also been able to achieve world-first stats in some areas. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities. On the other hand, GPT-4, the AI model that powers ChatGPT, scored 86.4%.
However, according to Google’s stats, in one of the benchmarks called ‘HellaSwag’ that determines the AI model’s ability to address commonsense reasoning for everyday tasks, GPT-4 scored 95.3% while Gemini was at 87.8%.
Gemini Ultra also scored 59.4% (compared to 56.8% for GPT-4) on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning. In most of the other benchmarks, Gemini Ultra beats GPT-4.
What are Gemini’s Capabilities?
Gemini 1.0’s sophisticated multimodal reasoning capabilities can help it analyse complex written and visual information, making it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data. It can extract insights from hundreds of thousands of documents through reading, filtering and understanding information.
Next, the Gemini AI model by Google was trained to recognise and understand text, images, audio and more simultaneously, so it better understands nuanced information and can answer questions relating to complicated topics.
It has advanced coding skills, where it can explain and generate code in popular languages like Python, Java, C++, and Go. Furthermore, using a specialised version of Gemini, Google says that it created a more advanced code generation system, AlphaCode 2 (succeeding AlphaCode), which excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science.
Google has also designed Gemini in a manner that makes it scalable, reliable and efficient at the same time. It trained Gemini on its AI-optimized infrastructure using Google’s in-house designed Tensor Processing Units (TPUs) v4 and v5e. On TPUs, Gemini is claimed to run significantly faster than smaller and less-capable models. These custom-designed AI accelerators are also used in other services from Google such as Search, YouTube, Gmail, Google Maps, Google Play and Android.
As a result, a more powerful, efficient and scalable TPU system called Cloud TPU v5p was also announced for enhanced AI training.
Where can you use Gemini by Google?
The Pro version of Gemini is now being used in Bard for more advanced reasoning, planning, understanding and more. This is the biggest upgrade to Bard since it launched. It will be available in English in more than 170 countries and territories, and Google plans to expand to different modalities and support new languages and locations in the near future.
Google also plans to integrate Gemini Ultra in Bard early next year. “We’re currently completing extensive safety checks and will launch a trusted tester program soon before opening Bard Advanced up to more people early next year”, said Google.
Separately, it will make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.
Additionally, using Gemini Nano’s capabilities, Google will enable advanced AI features on its Pixel smartphones. Google announced that Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which is powering new features like Summarize in the Recorder App and rolling out in Smart Reply in Gboard, starting with WhatsApp — with more Messaging apps coming next year. This feature is rolling out to Google Pixel 8 Pro as a part of the December feature drop update.
Gemini will be available in more of Google’s products and services in the coming months, such as Search, Ads, Chrome and Duet AI. The company has already begun an experiment with Gemini in Search, where it’s making the Search Generative Experience (SGE) faster for users, with a 40% reduction in Latency in English in the U.S., alongside improvements in quality.
Beginning December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.