Vision-Language Models

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

MolmoPoint is a new VLM architecture that enables more precise and efficient visual grounding by using special tokens to directly select from the model's internal visual representation instead of generating text coordinates.

Jun 5, 2026

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

MolmoB0T is a suite of general-purpose robotic manipulation policies trained on 2.5 million simulated trajectories that achieve state-of-the-art zero-shot transfer to diverse real-world tasks and environments.

Jun 4, 2026

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Molmo2 is a new family of open-source video-language models that achieve state-of-the-art performance through novel datasets and training methods, particularly excelling in video grounding tasks without relying on proprietary models.

Jun 3, 2026

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

VideoNet is a large-scale domain-specific action recognition benchmark and training dataset with 1,000 distinct actions across 37 domains, designed to revitalize action recognition evaluation for modern vision-language models.

Jun 2, 2026

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples

We present HinTel-AlignBench, a scalable framework and comprehensive benchmark for evaluating Vision-Language Models in Hindi and Telugu with English-aligned samples.

May 31, 2026

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

STTS is a novel, simple yet effective technique for unified, architecture-wide vision token pruning across both ViT and LLM, improving efficiency by 62% with minimal performance loss in video QA tasks.

Mar 18, 2026

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

We present Molmo, a new family of state-of-the-art VLMs. Starting from pre-trained vision encoders and language-only LLMs, the entire remainder of our VLM pipeline – weights, code, data, and evaluations – is open and free from VLM distillation.

Jun 1, 2025