AI & Fundamentals
Challenges and Opportunities with Multimodal LLMs - Vicente Ordóñez-Román, Associate Professor, Rice University

Vicente Ordóñez Román image

DATE: Mon, April 15, 2024 - 10:00 am

LOCATION: UBC Vancouver Campus, ICCS X836



In this talk I will provide an overview of how the field of computer vision has been impacted by the recent success of Multimodal LLMs, and what are some of the challenges and opportunities associated with these models. I will present some of our recent work leveraging Multimodal LLMs, including our SCoRD model that turns a multimodal LLM into a subject-conditional visual relation prediction and grounding model through enhanced text supervision. SCoRD takes as input an image and a subject and predicts an exhaustive list of all the objects interacting with the subject along with the type of relationship with the given subject. I will also discuss some of our other work on supervised visual grounding through Attention Mask Consistency (AMC) and weakly supervised visual grounding through self-consistent gradient-based model explanations (SelfEQ). Then I will discuss our Autoregressive Visual Entity Recognition -- AutoVER -- model for mitigating hallucinations from a multimodal LLM and improving its performance on restricted domains through constrained decoding. AutoVER aims to classify objects such as vehicles in a fine-grained setting but uses constrained decoding to avoid predicting inexistent types of vehicles. Finally, I will describe PropTest, a general framework to improve the interpretability and reliability of Multimodal LLMs for more open ended domains through code generation and property testing. PropTest aims to improve current solutions that generate code to solve a problem in the visual domain by also leveraging an LLM to generate testing code. This talk will use these works to highlight some of the opportunities but also the challenges and possible avenues for future work in this domain. 


Vicente Ordóñez-Román is an Associate Professor in the Department of Computer Science at Rice University and an Amazon Visiting Academic at Amazon Alexa AI. His research interests lie at the intersection of computer vision, natural language processing and machine learning. He is a recipient of a Best Paper Award at the conference on Empirical Methods in Natural Language Processing (EMNLP) 2017 and the Best Paper Award -- Marr Prize at the International Conference on Computer Vision (ICCV) 2013. He has also been the recipient of an NSF CAREER Award, an IBM Faculty Award, a Google Faculty Research Award, and a Facebook Research Award. Vicente obtained his PhD from the University of North Carolina at Chapel Hill, and has also been a visiting researcher at the Allen Institute for Artificial Intelligence and a visiting professor at Adobe Research.

< Back to Events