Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
We present Molmo, a new family of state-of-the-art VLMs. Starting from pre-trained vision encoders and language-only LLMs, the entire remainder of our VLM pipeline – weights, code, data, and evaluations – is open and free from VLM distillation.
Jun 1, 2025