I research topics in computer vision. My current focus is on developing Multimodal LLMs with novel capabilities at Ai2. My other research interests are in video generation, scene understanding, 3D vision and Indic Language models.
I received my Master’s in CS from Cornell Tech in June 2018 and my Bachelor’s in CS from the Indian Institute of Technology, Kanpur in June 2015. Previously, I worked at Amazon across Audio based Talking Face generation, text based Image and Video generation, action recognition and object recognition. I worked at IBM research in indoor localisation and audio captcha generation.
Feel free to reach out, I am always interested in possible research collaborations and am happy to guide self-motivated researchers on a project.
MEng Computer Science
Cornell Tech, Cornell University
BTech in Computer Science
Indian Institute of Technology, Kanpur