Gaurav Jain's profile picture.

I am a Research Scientist at Meta, working on AI/ML systems to enhance how people explore and engage with visual content. I earned my Ph.D. in Computer Science from Columbia University under Brian A. Smith in the Computer-Enabled Abilities Lab, focusing on AI-driven accessibility for blind and low-vision users. My work bridges academia and industry, designing systems that empower exploratory, user-driven interaction with visual information across diverse contexts. I also interned at Apple's Human-Centered Machine Intelligence team, where I worked on AI-driven accessibility solutions.


Meta Logo Apple Logo Columbia University Logo



AI-Driven Interactive Systems for Agency in Access to Visual Experiences

I design, build, and evaluate AI-driven systems to augment visual experiences. My work focuses on fostering exploration and agency, spanning accessibility for blind and low-vision (BLV) users and innovative approaches to video exploration.

My systems allow BLV users to independently visualize action in sports broadcasts via 3D spatial audio (CHI 2023, UIST 2023), embed AI within streets to help BLV users navigate safely outdoors (ASSETS 2023, UIST 2024), and leverage multimodal AI to enable exploration of street view imagery with rich contextual understanding (CHI 2026). To guide these designs, I also conduct qualitative studies to understand user needs. My work introduced the concept of "exploration assistance systems," characterizing systems that scaffold the process of exploration (CSCW 2023), earning an Impact Recognition Award πŸ†.

At Meta, I leverage AI/ML techniques to transform how users interact with videos. My work focuses on enabling user-driven exploration of long-form videos, allowing users to navigate chapters, discover key moments, and engage with the most relevant content seamlessly.




News


  • [Jan '26] πŸŽ‰ SceneScout accepted at CHI 2026; work done during my Apple internship.

  • [Oct '25] πŸ’Ό Started new role as a Research Scientist at Meta.

  • [May '25] πŸ† Received the PhD Service Award for outstanding contributions to the CS department.

  • [Apr '25] πŸ“„ SceneScout is now on arXiv (Apple internship project).

  • [Oct '24] ✈️ Attending UIST 2024 in Pittsburgh to present this paper.

  • [Jun '24] πŸŽ‰ Paper accepted at UIST 2024.

  • [May '24] 🍎 Starting internship at Apple in Seattle!

  • [May '24] ✈️ Attending CHI 2024 in Hawaii 🏝️, and serving as a student volunteer.

  • [Apr '24] 🌟 Completed my PhD Thesis Proposal.

  • [Apr '24] πŸ™‹β€β™‚οΈ Excited to serve as Publicity Co-chair for UIST 2024.

  • [Nov '23] πŸŽ™οΈ Gave a talk at the Vision Zero Research Symposium to NYC Govt. officials on StreetNav.

  • [Oct '23] ✈️ Attending UIST 2023 in San Francisco to present this paper (Talk).

  • [Oct '23] ✈️ Attending ASSETS 2023 in New York to present this poster.

  • [Oct '23] ✈️ Attending CSCW 2023 in Minneapolis to present this paper.

Read More

Read Less




Selected Publications


For an updated list of articles, please visit my Google Scholar profile.


SceneScout's Route Preview interaction mode. On the left, a top-down map displays a sidewalk route from a start point to a bus stop destination, with labeled positions A₁, B₁, and C₁ marking points along the path. At each point, panoramic street view images are shown: A₁ (starting point), B₁ (midpoint), and C₁ (destination). The AI agent icon moves along this path, triggering the retrieval of relevant imagery. On the right, the SceneScout web interface displays corresponding step-by-step textual descriptions: Aβ‚‚, Bβ‚‚ (landmarks and route features), and a detailed destination description at Cβ‚‚. These descriptions vary in length (short, medium, long) and highlight mobility cues, sidewalk quality, landmarks, and tactile signals. Arrows and layout emphasize the directional flow from the user-defined route to imagery acquisition, description generation, and user interface output.
SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users

ACM CHI 2026 (To appear) [Acceptance Rate: 25%]

Gaurav Jain, Leah Findlater, Cole Gleason

Paper

Street intersection as seen from a second floor camera view. A blind pedestrian is detected by the system and is crossing the street. A car is blocking the blind pedestrian’s path, and a mobile app provides them with a warning saying: Caution! Car 3 ft ahead.
StreetNav: Leveraging Street Cameras to Support Precise Outdoor Navigation for Blind Pedestrians

ACM UIST 2024 [Acceptance Rate: 24%]

Gaurav Jain, Basel Hindi, Zihao Zhang, Koushik Srinivasula, Mingyu Xie, Mahshid Ghasemi, Daniel Weiner, Sophie Ana Paris, Xinyi Xu, Michael Malcolm, Mehmet Turkcan, Javad Ghaderi, Zoran Kostic, Gil Zussman, Brian A. Smith

Paper Project Page Talk 30s Video Preview

Blind man sitting in a chair wearing headphones, with a monitor behind him playing a tennis broadcast video.
Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers

ACM UIST 2023 [Acceptance Rate: 25.1%]

Gaurav Jain, Basel Hindi, Connor Courtien, Xin Yi Therese Xu, Conrad Wyrick, Michael Malcolm, Brian A. Smith

Paper Project Page Talk 30s Video Preview

Sketch of a large shopping complex with three things highlighted: area shape, layout, and collaboration between individuals.
β€œI Want to Figure Things Out”: Supporting Exploration in Navigation for People with Visual Impairments

ACM CSCW 2023

Trophy icon Impact Recognition Award

Gaurav Jain, Yuanyang Teng, Dong Heon Cho, Yunhao Xing, Maryam Aziz, Brian A. Smith

Paper Project Page Blog

A two-part figure showing overview of street camera-based navigation system. On the left is a screenshot of the smartphone iOS app. On the right is the second floor camera view of a street intersection.
Towards Street Camera-based Outdoor Navigation for Blind Pedestrians

ACM ASSETS 2023 (Posters)

Gaurav Jain, Basel Hindi, Mingyu Xie, Zihao Zhang, Koushik Srinivasula, Mahshid Ghasemi, Daniel Weiner, Xinyi Xu, Sophie Ana Paris, Chloe Tedjo, Josh Bassin, Michael Malcolm, Mehmet Turkcan, Javad Ghaderi, Zoran Kostic, Gil Zussman, Brian A. Smith

Paper Demo Video

A blind person watching tennis broadcast on TV with a sighted friend.
Towards Accessible Sports Broadcasts for Blind and Low-Vision Viewers

ACM CHI 2023 (Extended Abstracts) [Acceptance Rate: 34%]

Gaurav Jain, Basel Hindi, Connor Courtien, Xin Yi Therese Xu, Conrad Wyrick, Michael Malcolm, Brian A. Smith

Paper Project Page 30s Video Preview

Overview of the proposed deep learning architecture.
Attention-Net: An Ensemble Sketch Recognition Approach Using Vector Images

IEEE Transactions on Cognitive and Developmental Systems 2022

Gaurav Jain*, Shivang Chopra*, Suransh Chopra*, A. S. Parihar (*equal contribution)

Paper

A 2 by 3 grid of target numerical phantoms used for quantitative evaluation of the proposed method.
Deep Neural Network Based Sinogram Super-resolution and Bandwidth Enhancement for Limited-data Photoacoustic Tomography

IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 2020

Gaurav Jain*, Navchetan Awasthi*, S. K. Kalva, Manojit Pramanik, P. K. Yalavarthy (*equal contribution)

Paper Code

Overview of the proposed architecture showing key-based feature extraction and adaptive weighted graph fusion.
Adaptive Weighted Graph Approach to Generate Multimodal Cancelable Biometric Templates

IEEE Transactions on Information Forensics and Security 2020

G. S. Walia, Gaurav Jain, Nipun Bansal, Kuldeep Singh

Paper

A 3 by 4 grid of human-made sketches with a color bar on the right. The first, second, and third row show bat (animal), turtle, and ant sketches, respectively.
TransSketchNet: Attention-based Sketch Recognition Using Transformers

ECAI 2020 (Short Paper)

Gaurav Jain*, Shivang Chopra*, Suransh Chopra*, A. S. Parihar (*equal contribution)

Paper Poster


Contact


If you’re interested in my work and wish to discuss anything, feel free to email me (gaurav [at] cs [dot] columbia [dot] edu)!