Multimodal AI in the Era of Gigantic Pretrained Language Models: Challenges and (Some) Remedies - Boyang Albert Li, Associate Professor, Nanyang Technological University

Boyang Albert Li image

DATE: Mon, June 26, 2023 - 11:00 am

UBC Vancouver Campus, ICCS X836


Large pretrained language models brought a series of capabilities, such as in-context learning, chain-of-thought reasoning, and many other prompting tricks, that surprised many experienced AI researchers. At the same time, the sheer sizes of these models brought unprecedented challenges in training and deployment. In this talk, I will present recent work that exploit the new capabilities of these models and alleviate some of their shortcomings. First, I will talk about how to obtain new multimodal capabilities, such as visual question answering, by connecting pretrained models using natural language as the medium without any training. Next, I will present a technique for data-efficient soft prompt tuning, which allows simplified model deployment. Finally, I will discuss video-language alignment for movie summaries, a task that involves high-level semantics and could be an interesting challenge in the era of large pretrained language models.


Boyang Albert Li is a Nanyang Associate Professor at the School of Computer Science and Technology, Nanyang Technological University. In 2021, he received the National Research Foundation Fellowship, a prestigious research award of 2.5M Singapore Dollars. Previously, he was a Senior Scientist at Baidu Research USA and a Research Scientist and Group Leader at Disney Research Pittsburgh. He received his Ph.D. degree from Georgia Institute of Technology. His work was reported by several international media outlets such as the Guardian, New Scientist, US National Public Radio, Engadget, TechCrunch, and so on.


