AI
Qwen 2.5
What was improved to Qwen2
- Scaled up pretraining data from 7T to 18T tokens, using existing LLMs to filter, classify and score data quality.
- Generated synthetic data for pertaining in math, code, and knowledge domains.
- Scaled up SFT to 1M+ samples covering long texts, math, coding, and multilingual tasks
- Translated instruction into different languages to boost multilinguality.
- Combined CoT with Rejection Sampling to generate high quality math data.
- Used offline reinforcement learning (DPO) on 150K training pairs focusing on complex tasks followed by merging with SFT models.
- Applied online reinforcement learning (GRPO) using 72B reward model trained for truthfulness, helpfulness, and safety and sampling 8 responses
Insights
- Trained base and instruction-tuned models of 7 sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72
- Architecture: GQA, SwiGLU, RoPE, QKV bias in the attention and RMSNorm.
- Used Qwen2-Instruct to classify and balance content across different domains
- Increasing pretraining 7T to 18T tokens boosted performance across all tasks
- Using LLMs to filter training data represents a significant advancement over our previous approaches
- SFT Model trained for two epochs with a LR decreasing from 7e-6 to 7e−7.
- DPO trained for 1 epoch on 150,000 examples with a LR of 7e-7
- Multi-stage post-training: Combining SFT, DPO, Merging and GRPO
ColPali integrated into HF transformers
use it as a python package:
1import torch
2from PIL import Image
3
4from colpali_engine.models import ColQwen2, ColQwen2Processor
5
6model_name = "vidore/colqwen2-v0.1"
7
8model = ColQwen2.from_pretrained(
9 model_name,
10 torch_dtype=torch.bfloat16,
11 device_map="cuda:0", # or "mps" if on Apple Silicon
12).eval()
13
14processor = ColQwen2Processor.from_pretrained(model_name)
15
16# Your inputs
17images = [
18 Image.new("RGB", (32, 32), color="white"),
19 Image.new("RGB", (16, 16), color="black"),
20]
21queries = [
22 "Is attention really all you need?",
23 "Are Benjamin, Antoine, Merve, and Jo best friends?",
24]
25
26# Process the inputs
27batch_images = processor.process_images(images).to(model.device)
28batch_queries = processor.process_queries(queries).to(model.device)
29
30# Forward pass
31with torch.no_grad():
32 image_embeddings = model(**batch_images)
33 query_embeddings = model(**batch_queries)
34
35scores = processor.score_multi_vector(query_embeddings, image_embeddings)
use it via HF transformers:
1model_name = "vidore/colpali-v1.2-hf"
2device = get_torch_device("auto")
3
4print(f"Using device: {device}")
5
6model = ColPaliForRetrieval.from_pretrained(
7 model_name,
8 torch_dtype=torch.bfloat16,
9 device_map=device,
10).eval()
11
12processor = ColPaliProcessor.from_pretrained(model_name)
fixing embedding issues with FT - when order matters
https://jina.ai/news/text-embeddings-fail-to-capture-word-order-and-how-to-fix-it/
Good use case for agents
Google's Deep Research is an excellent application of agentic capabilities. One example something it can do pretty well is search for all my podcasts and interviews and create a webpage listing them. Cuts down effort at least 10x compared to doing it manually. The reason it works well is that it's no big deal if it misses a couple of items. And if it got any details wrong I can easily spot them. The unreliability of LLMs usually gets magnified in agentic workflows (including in Deep Research), so it's really important that errors aren't costly.
Compare this to the usual motivating example for AI agents — automating shopping or flight booking. This is actually the worst-case scenario. If the wrong product shows up at your door even 10% of the time, the agent is useless. And don't forget that online commerce is an adversarial environment — comparison shopping is hard because companies deliberately make it hard. If agents make it easier, brands will fight back. As for flight booking, the time consuming part is preference elicitation. The reason it is frustrating is that search interfaces don't know all your preferences and constraints (e.g. how to trade off time and money, preferred airlines, constraints on when you want to depart and arrive, and really dozens of other little things). But guess what, the agent doesn't know this either. I really don't think shopping and travel booking agents are going to work, and it's not a matter of improving capabilities.
Over the long term there will be progress in closing the "capability-reliability gap" for agents, but for now, I think successful applications will be ones where (1) the user is in the loop, (2) errors are relatively easy to spot and (3) aren't a deal-breaker if not spotted.