Rohun Tripathi

Rohun Tripathi

Research Engineer

About Me

I research topics in computer vision. My current focus is on developing Multimodal LLMs with novel capabilities at Ai2. My other research interests are in video generation, scene understanding, 3D vision and Indic Language models.

I received my Master’s in CS from Cornell Tech in June 2018 and my Bachelor’s in CS from the Indian Institute of Technology, Kanpur in June 2015. Previously, I worked at Amazon across Audio based Talking Face generation, text based Image and Video generation, action recognition and object recognition. I worked at IBM research in indoor localisation and audio captcha generation.

Feel free to reach out, I am always interested in possible research collaborations and am happy to guide self-motivated researchers on a project.

Interests

Computer Vision
Language Models
Food Tech

Education

MEng Computer Science
Cornell Tech, Cornell University
BTech in Computer Science
Indian Institute of Technology, Kanpur

Publications

Urwa Muaz, Won-Dong Jang, Rohun Tripathi, Santhosh Mani, Wenbin Ouyang, R. Gadde, Baris Gecer, Sergio Elizondo, Reza Madad, Naveen Nair (2023). SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning. In ICCV.

PDF Cite Source Document

Akash Gupta, Rohun Tripathi, Won-Dong Jang (2023). MODEFORMER: Modality-Preserving Embedding For Audio-Video Synchronization Using Transformers. In ICASSP.

PDF Cite Source Document

Rohun Tripathi, Vasu Singla, Mahyar Najibi, Bharat Singh, Abhishek Sharma, Larry Davis (2020). ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors.

PDF Cite Source Document

Rohun Tripathi, Bharat Singh (2020). RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks.

PDF Cite Source Document

Rohun Tripathi*, Mohit Jain*, Ishita Bhansali, Pratyush Kumar (2019). Automatic Generation and Evaluation of Usable and Secure Audio reCAPTCHA. In ASSETS.

PDF Source Document

See all publications