Rohun Tripathi
Rohun Tripathi

Sr. Research Engineer

About Me

I build multimodal AI systems, leading the model development.

At Ai2, I develop foundation models for vision, robotics, and multimodal reasoning, with a focus on large-scale modeling, training and data curation - Molmo, Molmo2 and MolmoBot. Previously, at Amazon Research, I led projects spanning image and video generation, video understanding, and visual perception, taking models from research to production.

I hold a Master’s in Computer Science from Cornell Tech and a B.Tech. in Computer Science from IIT Kanpur.

I enjoy mentoring researchers and engineers and collaborating on ambitious AI projects. Additionally, I am passionate about exploring the intersection of AI and plant based food.

Download CV
Publications
MolmoPoint: Better Pointing for VLMs with Grounding Tokens. . Arxiv 2026.

PDF Cite Project Source Document

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation. . ICRA 2026 SDRL Workshop.

PDF Cite Code Dataset Project Source Document

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding. . CVPR 2026 (Best Paper Award Nominee).

PDF Cite Code Dataset Project Source Document

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition. . CVPR 2026 Highlight.

PDF Cite Dataset Project Source Document

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning. . CVPR 2026.

PDF Cite Code Project Source Document

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples. . ACL 2026 AVLR Workshop.

PDF Cite Dataset Source Document

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs. . Arxiv 2026.

PDF Cite Source Document

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models. . CVPR 2025.

PDF Cite Code Dataset Source Document

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning. . ICCV 2023.

PDF Cite Source Document

ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors. . Arxiv 2020.

PDF Cite Source Document

Automatic Generation and Evaluation of Usable and Secure Audio reCAPTCHA. . ACM ASSETS 2019.

PDF Source Document

Semantic Segmentation with Scarce Data. . ICML Workshop 2018.

PDF Cite Source Document

Enterprise Scale Privacy Aware Occupancy Sensing. . IEEE EDGE 2018.

Cite Source Document