Google's Gemma 2: Cutting-Edge AI, Blazing-Fast Inference

Google DeepMind has released their latest large language model, Gemma 2, available in 9B and 27B parameter sizes. Boasting a brand-new architecture design, Gemma 2 delivers industry-leading performance and efficiency compared to other models in its class.

Gemma 2: A Standout Model

Google DeepMind has built Gemma 2 on a newly designed architecture aimed at achieving exceptional performance and inference efficiency. Here’s what makes it stand out:

Unrivaled Performance

The 27B Gemma 2 model offers the best performance in its size category, providing a competitive alternative to models more than twice its size. The 9B model also delivers class-leading performance, surpassing Llama 3 8B and other open models of similar scale.

Efficiency and Cost Savings

The 27B Gemma 2 model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU. This significantly reduces costs while maintaining high performance, making AI deployment more accessible and affordable.

Lightning-Fast Inference Across Hardware

Optimized to run at incredible speeds across various hardware setups, from powerful gaming laptops and high-end desktops to cloud-based configurations, Gemma 2 is highly versatile. Experience Gemma 2 at full precision in Google AI Studio, unlock local performance with the quantized version of Gemma.cpp on CPU, or run it on home computers equipped with NVIDIA RTX or GeForce RTX through Hugging Face Transformers.

Running Gemma 2 Locally with Ollama

To run the latest Gemma 2 9B and 27B models locally using Ollama, follow these steps:

Ensure you have ollama installed on your computer and upgrade to the latest 0.1.47 version.
Once ollama is successfully installed, enter the following command in the terminal to run the Gemma 2 9B (5.5G) or 27B (16G) model:

   ollama run gemma2
   # Or 
   ollama run gemma2:27b

The command will automatically download the Gemma 2 9B or 27B model. If your computer has sufficient memory, you can install the non-quantized, high-precision version for an even better experience using:

   ollama run gemma2:9b-instruct-fp16
   # Or
   ollama run gemma2:27b-instruct-fp16

In addition to ollama, you can also experience Gemma 2 through llama.cpp or gemma.cpp.

Unleashing Gemma 2’s Potential

Gemma 2 9B is a powerful model capable of various tasks, including:

Basic Chatting

Text Translation

Original text:

Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.

Translation:

我们正式向全球的研究人员和开发者发布了 Gemma 2。Gemma 2 现在提供 90 亿 (9B) 和 270 亿 (27B) 参数两种规模，相较于第一代模型，性能更高，推理效率更高，并且内置了显著的安全改进。实际上，在 27B 参数规模下，它可以与参数规模超过其两倍的模型相媲美，并提供与去年 12 月仅限于专有模型可实现的性能。并且，这一切现在可以在单个 NVIDIA H100 算子核心 GPU 或 TPU 主机上实现，大大降低了部署成本。

Integrating Gemma 2 with LangChain and LlamaIndex

You can easily integrate Gemma 2 with popular frameworks like LangChain and LlamaIndex:

LangChain

from langchain_community.llms import Ollama
llm = Ollama(model="gemma2") 
llm.invoke("Why is the sky blue?")

LlamaIndex

from llama_index.llms.ollama import Ollama
llm = Ollama(model="gemma2")
llm.complete("Why is the sky blue?")

For more information, visit the Gemma 2 library on Ollama.

As Google continues to push the boundaries of AI with Gemma 2, developers and researchers worldwide can harness its power to build innovative applications and advance the field of natural language processing. With its exceptional performance, efficiency, and versatility, Gemma 2 is poised to revolutionize the AI landscape.

References

ollama: https://ollama.com/

llama.cpp: https://github.com/ggerganov/llama.cpp

gemma.cpp: https://github.com/google/gemma.cpp