Rohun Tripathi
Rohun Tripathi

Research Engineer

About Me

I research topics in computer vision. My current focus is on developing Multimodal LLMs with novel capabilities at Ai2. My other research interests are in video generation, scene understanding, 3D vision and Indic Language models.

I received my Master’s in CS from Cornell Tech in June 2018 and my Bachelor’s in CS from the Indian Institute of Technology, Kanpur in June 2015. Previously, I worked at Amazon across Audio based Talking Face generation, text based Image and Video generation, action recognition and object recognition. I worked at IBM research in indoor localisation and audio captcha generation.

Feel free to reach out, I am always interested in possible research collaborations and am happy to guide self-motivated researchers on a project.

Download CV
Interests
  • Computer Vision
  • Language Models
  • Food Tech
Education
  • MEng Computer Science

    Cornell Tech, Cornell University

  • BTech in Computer Science

    Indian Institute of Technology, Kanpur

Publications
(2023). SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning. In ICCV.
(2020). ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors.
(2019). Automatic Generation and Evaluation of Usable and Secure Audio reCAPTCHA. In ASSETS.