Recently, I needed to use a large language model (LLM) for data cleaning tasks, so I tested the data analysis and extraction capabilities of some open source large models. During the testing process, I noticed Google’s recently open-sourced Gemma 2 model, so let’s do a simple overview.
Introduction to Gemma 2
Gemma 2 is an upgraded version of Google’s Gemma model, which was based on DeepMind’s Gemini technology. It comes in two sizes: 9B and 27B parameters. Like its predecessor, Gemma 2 uses RoPE (Rotary Position Embedding) and supports a context length of 8K tokens. Each model size includes a pre-trained base version and an instruction-optimized version.
GitHub: https://github.com/huggingface/blog/blob/main/gemma2.md
Although Gemma 2 shares many similarities with Gemma, there are several notable improvements:
Expanded Training Data
The training data for Gemma 2 is approximately double that of Gemma, totaling 13 TB tokens for the 27B model and 8 TB tokens for the 9B model. The data consists primarily of English web pages, code, and mathematical data. However, the specific details of the training data have not been open-sourced.
Model and Training Enhancements
- Sliding Window Attention: Alternating between sliding window and full quadratic attention to improve generation quality while reducing memory and time for attention computation.
- Logit Soft Capping: Inspired by Gemini 1.5, logits are scaled to a fixed range to prevent excessive growth and improve training stability.
- Knowledge Distillation: Larger teacher models are used to train smaller student models. The 9B model uses distillation during pre-training, while the 27B model is trained from scratch.
- Model Merging: Two or more large language models are merged into a new model using techniques like Warp, which involves exponential moving average (EMA), spherical linear interpolation (SLERP), and linear interpolation to initialization (LITI).
Evaluation and Performance
Gemma 2 demonstrates strong performance across various benchmarks and evaluations:
- In the base model evaluation, Gemma 2 shows competitive results compared to other open-source models.
- On the LMSYS ChatBot Arena, Gemma 2 achieves high scores in both the 9B and 27B versions.
- Human preference evaluations indicate that Gemma 2 is well-received by users.
Using Gemma 2
To perform inference with Gemma 2 using the Transformers library:
- Install the necessary dependencies:
pip install accelerate transformers
- Load the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-27b-it",
device_map="auto",
torch_dtype=torch.bfloat16,
revision="float16",
attn_implementation="flash_attention_2"
)
- Perform inference:
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Testing Gemma 2’s Language Understanding
To assess Gemma 2’s comprehension of Chinese and English, I conducted some simple tests using prompts. Here are a couple examples:
Example 1: Official English Prompt
Write a hello world program
Response:
print("Hello, world!")
This single line of code is all you need to print “Hello, world!” in Python.
Example 2: Chinese Prompt
写一个hello world程序, 以MarkDown形式输出
Response:
print("Hello, world!")
This is the simplest “Hello, world!” program in Python.
Explanation:
print()
is a built-in function in Python that displays text to the console."Hello, world!"
is a string literal containing the text to be printed.
While Gemma 2 demonstrates some understanding of Chinese, its responses in Chinese are relatively weak compared to English. This is likely because the SFT training data primarily consisted of English instructions. Developers interested in activating Gemma 2’s Chinese capabilities may want to consider fine-tuning the model with Chinese instruction data.
Conclusion
Google’s Gemma 2 is an impressive open-source large language model that offers strong performance and supports both English and Chinese to some extent. With its expanded training data, model enhancements, and competitive evaluation results, Gemma 2 is definitely worth experiencing for various natural language tasks.
For more details, check out the following resources: