Molmo2 is a new family of open-source video-language models that achieve state-of-the-art performance through novel datasets and training methods, particularly excelling in video grounding tasks without relying on proprietary models.
Jun 3, 2026
VideoNet is a large-scale domain-specific action recognition benchmark and training dataset with 1,000 distinct actions across 37 domains, designed to revitalize action recognition evaluation for modern vision-language models.
Jun 2, 2026