Open Data

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Molmo2 is a new family of open-source video-language models that achieve state-of-the-art performance through novel datasets and training methods, particularly excelling in video grounding tasks without relying on proprietary models.

Jun 3, 2026

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

We present Molmo, a new family of state-of-the-art VLMs. Starting from pre-trained vision encoders and language-only LLMs, the entire remainder of our VLM pipeline – weights, code, data, and evaluations – is open and free from VLM distillation.

Jun 1, 2025