BEGIN:VCALENDAR VERSION:2.0 PRODID:-//https://caida.ubc.ca//NONSGML iCalcreator 2.41.92// CALSCALE:GREGORIAN METHOD:PUBLISH UID:31383962-6139-4461-a430-663832376633 X-WR-RELCALID:efc09d74-9c93-479e-a94f-485231ddccde X-WR-TIMEZONE:America/Vancouver X-WR-CALNAME:Efficient Inference of Mixture-of-Experts (MoE)-based Large Mo dels with Theoretical Guarantees - Meng Wang\, Professor\, Rensselaer Poly technic Institute BEGIN:VTIMEZONE TZID:America/Vancouver TZUNTIL:20270314T100000Z BEGIN:STANDARD TZNAME:PST DTSTART:20241103T020000 TZOFFSETFROM:-0700 TZOFFSETTO:-0800 RDATE:20251102T020000 RDATE:20261101T020000 END:STANDARD BEGIN:DAYLIGHT TZNAME:PDT DTSTART:20250309T020000 TZOFFSETFROM:-0800 TZOFFSETTO:-0700 RDATE:20260308T020000 END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:b4a3e525-60c2-481c-82df-26e0b732310d DTSTAMP:20260220T014414Z CLASS:PUBLIC CREATED:20250709T194331Z DESCRIPTION:Abstract: Mixture-of-Experts (MoE) architectures have emerged a s a powerful paradigm for scaling large models by routing inputs to specia lized subnetworks (experts)\, achieving impressive performance with reduce d computation during training. However\, efficient inference of MoE models remains challenging due to memory and computational overhead\, especially when deployed in resource-constrained environments. In this talk\, I will first introduce a provably efficient expert pruning method for fine-tuned MoE models\, which preserves test-time accuracy by pruning experts with m inimal change in router… DTSTART;TZID=America/Vancouver:20250715T140000 DTEND;TZID=America/Vancouver:20250715T150000 LAST-MODIFIED:20250709T195414Z LOCATION:UBC Vancouver Campus\, Fried Kaiser (KAIS) building\, Room 2020/20 30\, 2332 Main Mall SUMMARY:Efficient Inference of Mixture-of-Experts (MoE)-based Large Models with Theoretical Guarantees - Meng Wang\, Professor\, Rensselaer Polytechn ic Institute TRANSP:OPAQUE URL:https://caida.ubc.ca/event/efficient-inference-mixture-experts-moe-base d-large-models-theoretical-guarantees-meng-wang END:VEVENT END:VCALENDAR