Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
Molmo2 is a new family of open-source video-language models that achieve state-of-the-art performance through novel datasets and training methods, particularly excelling in video grounding tasks without relying on proprietary models.
Jun 3, 2026