Séminaire du LIRIS - Prof. Mubarak SHAH : Fine-Grained Video Retrieval. | Laboratoire d'InfoRmatique en Image et Systèmes d'information

LIRIS Seminar - Prof. Mubarak SHAH: Fine-Grained Video Retrieval.

We are delighted to welcome Professor Mubarak SHAH to LIRIS on Friday, September 19, 2025. He will present his work on fine-grained video retrieval starting at 1:30 p.m. Meeting location: Gaston Berger Lecture Hall, LyonTech-La Doua Campus.

Summary : The goal of video retrieval is to learn robust representations such that a query's representation can effectively retrieve relevant items from a video gallery. While traditional methods typically return semantically related results, they often fail to ensure temporal alignment or capture fine-grained temporal nuances. To address these limitations, I will begin by introducing Alignable Video Retrieval (AVR), a novel task that tackles the previously unexplored challenge of identifying temporally alignable videos within large datasets. Next, I will present Composed Video Retrieval (CoVR), which focuses on retrieving a target video based on a query video and a modification text describing the desired change. Existing CoVR benchmarks largely focus on appearance variations or coarse-grained events, falling short in evaluating models’ ability to handle subtle, fast-paced temporal changes and complex compositional reasoning. To bridge this gap, we introduce two new datasets—Dense-WebVid-CoVR and TF-CoVR—which capture fine-grained and compositional actions across diverse video segments, enabling more detailed and nuanced retrieval tasks. I will conclude the talk with our recent work on ViLL-E: Video LLM Embeddings for Retrieval. ViLL-E extends VideoLLMs by introducing a joint training framework that supports both generative tasks (e.g., VideoQA) and embedding-based tasks such as video retrieval. This dual capability enables VideoLLMs to generate embeddings for retrieval functionality lacking in current models—without sacrificing generative performance.

Bio : Dr. Mubarak SHAH, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). Dr. Shah is a fellow of ACM, IEEE, AAAS, NAI, IAPR, AAIA and SPIE. He has published extensively on topics related to human activity and action recognition, visual tracking, geo localization, visual crowd analysis, object detection and categorization, shape from shading, etc. He has served as an ACM and IEEE Distinguished Visitor Program speaker. He is a recipient of 2022 PAMI Mark Everingham Prize for pioneering human action recognition datasets; 2019 ACM SIGMM Technical Achievement award; 2020 ACM SIGMM Test of Time Honorable Mention Award for his paper “Visual attention detection in video sequences using spatiotemporal cues”; 2020 International Conference on Pattern Recognition (ICPR) Best Scientific Paper Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he has received Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students; Faculty Excellence in Mentoring Postdoctoral Scholarship of Teaching and Learning award; Teaching Incentive Program award; and Research Incentive Award.