Publications

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation. . ICRA 2026 SDRL Workshop (Best Paper Award).

PDF Cite Code Dataset Project Source Document

MolmoPoint: Better Pointing for VLMs with Grounding Tokens. . Arxiv 2026.

PDF Cite Project Source Document

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding. . CVPR 2026 (Best Paper Award Nominee).

PDF Cite Code Dataset Project Source Document

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition. . CVPR 2026 Highlight.

PDF Cite Dataset Project Source Document

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning. . CVPR 2026.

PDF Cite Code Project Source Document

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples. . ACL 2026 AVLR Workshop.

PDF Cite Dataset Source Document

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs. . Arxiv 2026.

PDF Cite Source Document

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models. . CVPR 2025.

PDF Cite Code Dataset Source Document

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning. . ICCV 2023.

PDF Cite Source Document