BEGIN:VCALENDAR VERSION:2.0 PRODID:-//https://caida.ubc.ca//NONSGML iCalcreator 2.41.92// CALSCALE:GREGORIAN METHOD:PUBLISH UID:64313665-3232-4363-b530-346131366664 X-WR-RELCALID:efc09d74-9c93-479e-a94f-485231ddccde X-WR-TIMEZONE:America/Vancouver X-WR-CALNAME:Advancing Multimodal Vision-Language Learning - Aishwarya Agra wal\, Assistant Professor\, University of Montreal BEGIN:VTIMEZONE TZID:America/Vancouver TZUNTIL:20261101T090000Z BEGIN:STANDARD TZNAME:PST DTSTART:20241103T020000 TZOFFSETFROM:-0700 TZOFFSETTO:-0800 RDATE:20251102T020000 END:STANDARD BEGIN:DAYLIGHT TZNAME:PDT DTSTART:20240310T020000 TZOFFSETFROM:-0800 TZOFFSETTO:-0700 RDATE:20250309T020000 RDATE:20260308T020000 END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:db10b64a-7c9f-419c-b809-bed1f2743528 DTSTAMP:20260306T065734Z CLASS:PUBLIC CREATED:20241212T191414Z DESCRIPTION:Abstract: Over the last decade\, multimodal vision-language (VL ) research has seen impressive progress. We can now automatically caption images in natural language\, answer natural language questions about image s\, retrieve images using complex natural language queries and even genera te images given natural language descriptions. Despite such tremendous pro gress\, current VL research faces several challenges that limit the applic ability of state-of-art VL systems. Even large VL systems based on multimo dal large language models (MLLMs) such as GPT-4V and Gemini struggle with counting objects in… DTSTART;TZID=America/Vancouver:20241216T134500 DTEND;TZID=America/Vancouver:20241216T144500 LAST-MODIFIED:20241212T191956Z LOCATION:UBC Vancouver Campus\, ICCS X836 SUMMARY:Advancing Multimodal Vision-Language Learning - Aishwarya Agrawal\, Assistant Professor\, University of Montreal TRANSP:OPAQUE URL:https://caida.ubc.ca/index.php/event/advancing-multimodal-vision-langua ge-learning-aishwarya-agrawal-assistant-professor END:VEVENT END:VCALENDAR