MolmoPoint: Better Pointing for VLMs with Grounding Tokens
MolmoPoint is a new VLM architecture that enables more precise and efficient visual grounding by using special tokens to directly select from the model's internal visual representation instead of generating text coordinates.
Jun 5, 2026