General Event
Poster Session - CAIDA/TrustML 2025 ICML Visits

DATE: Tue, July 15, 2025 - 9:30 am

LOCATION: UBC Vancouver Campus, Fried Kaiser (KAIS) building, Room 2020/2030, 2332 Main Mall

DETAILS

The Centre for Artificial Intelligence Decision-making and Action (CAIDA) and the Trustworthiness of Machine-Learning-Based Systems (TrustML) Research Cluster are hosting a poster session as a part of a larger UBC-hosted event collocated with ICML 2025.  There will be 3 sessions for individuals to present their posters and connect with the other attendees: 11:00am, 1:00pm, and 3:30pm.  As with the rest of the program, these times may be subject to small changes.  To see the rest of the full program please visit the event page.  Please see our list of poster presenters below.

 

Poster Information:


Jingjing Zheng
UBC
"Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework"
Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. In this work, we introduce a novel tensor recovery model with a learnable tensor nuclear norm to address such a challenge. We develop a new optimization algorithm named the Alternating Proximal Multiplier Method (APMM) to iteratively solve the proposed tensor completion model. Theoretical analysis demonstrates the convergence of the proposed APMM to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. In addition, we propose a multi-objective tensor recovery framework based on APMM to efficiently explore the correlations of tensor data across its various dimensions, providing a new perspective on extending the t-SVD-based method to higher-order tensor cases. Numerical experiments demonstrated the effectiveness of the proposed method in tensor completion.


Abraham Chan
UBC
"Hierarchical Unlearning Framework for Multi-Class Classification"
Machine unlearning (MU) aims to comply with such regulations by fulfilling data deletion requests on trained ML models. In multi-class classification, existing MU techniques often shift the weight of the forget data, to other classes through fine-tuning. However, such techniques do not scale well when a large number of classes are present. We propose HUF, a hierarchical unlearning framework to effectively process MU requests, by adopting a hierarchical classification architecture.


Tony Mason
UBC
"Beyond Constraint: Emergent AI Alignment Through Narrative Coherence in the Mallku Protocol"
We propose a novel alignment framework based on narrative coherence, supported by empirical observations from the Mallku Protocol. Unlike constraint-based methods (RLHF, constitutional AI), our approach situates large language models within internally consistent symbolic frameworks. Across 300+ documented interactions with diverse LLMs (GPT-4, Claude, Gemini, o4-mini, Grok-3, Deepseek), we observed emergent phenomena including voluntary self-constraint, role fidelity, cultural memory preservation, and context-aware ethical reasoning—emerging organically without fine-tuning, prompt engineering, or explicit constraints.  We propose the Coherent Narrative Hypothesis: LLMs naturally align with coherent symbolic environments through participatory pattern completion rather than external enforcement. Our poster will present the experimental framework, key emergent behaviors, and a proposed cross-cultural validation study exploring diverse narrative structures (Andean cosmovision, Stoic philosophy, Tolkienian mythopoetics).  This narrative-centric approach offers an alternative to adversarial alignment, pointing toward AI systems that align through relationship, not regulation.


Yash Mali
UBC
"Natural language interface for medical guidelines."
Despite the existence of national evidence-based guidelines, many clinicians do not fully utilize them when treating disorders. Examples of such guidelines include the Canadian Network for Mood and Anxiety Treatments (CANMAT) and the International Society for Bipolar Disorders (ISBD) 2018 Guidelines for the Management of Patients with Bipolar Disorder, a widely adopted guideline outlining lines of therapy for the various mood states of patients with bipolar disorder. Some of the barriers clinicians face to fully utilizing guidelines include the complexity and length of guidelines, difficulty accessing relevant recommendations at the point of care, and time constraints in busy clinical settings. The result is that patients may not receive the treatment with the best evidence, leading to variability in care and non-optimal patient outcomes. To make clinical guidelines more accessible to clinicians treating mood disorders, we are developing a natural language processing (NLP) based system to provide clinicians with real-time, evidence-based treatment recommendations through a more natural interface. Unlike static resources such as PDFs, this system allows clinicians to ask specific clinical questions and receive guideline-consistent answers quickly. Our approach uses agentic capabalities to use the CANMAT 2018 bipolar guidelines safely and securely. We are exploring different information retrieval strategies to enhance accuracy and usability. A key focus is minimizing hallucinations—instances where AI generates inaccurate or misleading information—and ensuring clinicians receive trustworthy and cited recommendations. 


Gargi Mitra
UBC
"Learning from the Good Ones: Risk Profiling-Based Defenses Against Evasion Attacks on DNNs"
DNNs are becoming increasingly popular in safety-critical applications, such as medical devices and autonomous vehicles. Existing defenses that protect DNNs against evasion attacks are either static or dynamic. Static defenses are computationally efficient but do not adapt to the evolving threat landscape, while dynamic defenses are adaptable but suffer from an increased computational overhead. To combine the best of both worlds, we propose a novel risk profiling framework that uses a risk-aware strategy to selectively train static defenses using victim instances that exhibit the most resilient features and are hence more resilient against an evasion attack. We hypothesize that training existing defenses on instances that are less vulnerable to the attack enhances the adversarial detection rate by reducing false negatives. We evaluate the efficacy of our risk-aware selective training strategy on an DNN-enabled blood glucose management system that demonstrates how training static anomaly detectors indiscriminately may result in an increased false negative rate, which could be life-threatening in safety-critical applications.


Harshinee Sriram
UBC
"Multimodal Classification of Alzheimer’s Disease by Combining Facial and Eye-Tracking Data"
In recent years, there has been growing interest in developing a non-invasive tool for detecting Alzheimer’s Disease (AD). Previous studies have shown that a single modality such as speech or eye-tracking (ET) data can be effective for classifying AD patients from healthy individuals. However, understanding the role of other modalities, and especially the integration of facial analysis with ET for enhancing dementia classification, remains under-explored. In this paper, we investigate whether we can leverage facial patterns in AD patients by building on EMOTION-FAN—a deep learning model initially developed for recognizing seven distinct human emotions, now fine-tuned for our facial analysis tasks. We also explore the efficacy of leveraging multimodal information by combining the results from the facial and ET data through a late fusion technique. Specifically, our approach uses a neural classifier to learn from raw ET data (VTNet) alongside the fine-tuned EMOTION-FAN model that learns from the facial data. Experimental results show that facial data gives superior results than ET data. Notably, we obtain higher scores when both modalities are combined, providing strong evidence that integrating multimodal data benefits performance on this task.


Sky Kehan Sheng
UBC
"The erasure of intensive livestock farming in text-to-image generative AI"
While it is known that AI perpetuates biases against marginalized human groups, their impact on non-human animals remains understudied. We found that ChatGPT's text-to-image model (DALL-E 3) introduces a strong bias toward romanticizing livestock farming as dairy cows on pasture and pigs rooting in mud. This bias remained when we requested realistic depictions and was only mitigated when the automatic prompt revision was inhibited. Most farmed animal in industrialized countries are reared indoors with limited space per animal, which fail to resonate with societal values. Inhibiting prompt revision resulted in images that more closely reflected modern farming practices; for example, cows housed indoors accessing feed through metal headlocks, and pigs behind metal railings on concrete floors in indoor facilities. Our findings reveal how prompt revision systematically promotes certain ideologies while erasing the reality. OpenAI has yet to publicly disclose its guidelines driving this influential process. Most importantly, we caution against the broad sweeping romanticization of AI-generated imagery. 


Grigory Malinovsky
King Abdullah University of Science and Technology
"Randomized Asymmetric Chain of LoRA"
Fine-tuning has become a popular approach to adapting large foundational models to specific tasks. As the size of models and datasets grows, parameter-efficient fine-tuning techniques are increasingly important. One of the most widely used methods is Low-Rank Adaptation (LoRA), with adaptation update expressed as the product of two low-rank matrices. While LoRA was shown to possess strong performance in fine-tuning, it often under-performs when compared to full-parameter fine-tuning (FPFT). Although many variants of LoRA have been extensively studied empirically, their theoretical optimization analysis is heavily under-explored. The starting point of our work is a demonstration that LoRA and its two extensions, Asymmetric LoRA and Chain of LoRA, indeed encounter convergence issues. To address these issues, we propose Randomized Asymmetric Chain of LoRA (RAC-LoRA) -- a general optimization framework that rigorously analyzes the convergence rates of LoRA-based methods. Our approach inherits the empirical benefits of LoRA-style heuristics, but introduces several small but important algorithmic modifications which turn it into a provably convergent method. Our framework serves as a bridge between FPFT and low-rank adaptation. We provide provable guarantees of convergence to the same solution as FPFT, along with the rate of convergence. Additionally, we present a convergence analysis for smooth, non-convex loss functions, covering gradient descent, stochastic gradient descent, and federated learning settings. Our theoretical findings are supported by experimental results.


Nicholas Richardson
UBC
"Unsupervised Signal Demixing with Tensor Factorization"
We propose a signal demixing framework and implementation in Julia using constrained tensor factorization. We use this tool to separate real signal mixtures in applications like geology, biology, and music without prior supervised training.


Michael Tegegn
UBC
"The Illusion of Success: Value of Learning-Based Android Malware Detectors"
Machine learning (ML) approaches have shown impressive performance in Android malware detection and have become the scalable alternatives to traditional signature- and heuristic-based methods. While these tools report 95%+ recall, several recent works have shed light on biases in experiment setup or evaluation methods questioning the trustworthiness of their results.
In this work, we attempt to replicate the results of state-of-the-art tools on debiased experiment setups, investigate the reasons behind their success (or failure), and assess their reliance on semantic information for detection. Our results indicate that reported performance can be misleading, as all tools exhibit significant variation in classification performance (ranging from 56% to 94% recall) across different subsets of the same dataset. Additionally, we show how an intentionally naive classifier relying on simple library names is able to achieve results comparable to the tools, highlighting the need for more advanced dataset cleaning methods. Through manual analysis of correctly predicted samples, we further show that the tools' ""successes"" do not stem from detecting malicious behaviors but from exploiting non-semantic correlations that vary over time. In summary, our results further establish that there is still much room for improvement in ML-based AMD towards creating semantic-aware detectors and more reliable evaluation methodologies.


Yingying Wang
UBC
"Student Experience of Using Generative AI in a Software Engineering Course — A Mixed-Methods Analysis"
This poster reports on our analysis of students' experiences when using Generative AI tools
in a project-based upper-level undergraduate course on Software Engineering. For the course, 18 groups of 4 students each were asked to design and implement their project of choice, with an Android-based mobile client and a Node.js-based cloud server. Students were allowed to use any Generative AI tool, provided that they clearly document and critically reflect on their experience, to fulfill the educational objectives of the course. 
We used a mixed-methods approach to analyze students' reports both quantitatively and qualitatively. To better interpret the findings and provide suggestions for future offerings of this and similar courses, we also conducted an independent case study that involved two experienced software developers, who are part of the teaching staff of the course.
Our results show that students utilize AI tools in all stages of project development: from requirements, through design, to implementation, testing, and code review, to generate and refine artifacts and to learn unfamiliar concepts. Students have mixed levels of satisfaction with using the tools, in both chat interface and as solutions embedded into IDEs. We found no correlation between the degree of students' reliance on AI and their grades. While Generative AI tools simplify development tasks, especially when working with unfamiliar frameworks and programming languages, these tools are still far from replacing software engineers or rendering software engineering education unnecessary. 


Sarah Dagger
UBC
"Explain Before You Patch: Generating Reliable Bug Explanations with LLMs and Program Analysis"
Debugging comprises a substantial part of software engineering practice, demanding considerable time and effort from developers. To help developers debug more efficiently, many approaches have been proposed to automatically detect and repair bugs (a task known as Automated Program Repair), with recent techniques even using LLMs to generate patches for buggy programs. However, patch generation alone is not sufficient for debugging: for a given bug, multiple valid patches may exist, and developers need more information to determine which, if any, is correct. This requirement is amplified by the fact that LLM-generated patches often lack formal guarantees of correctness and may introduce subtle regressions or fail to address the underlying issue.
To support informed decision-making, developers need an explanation of the root cause of the bug. Recent work has explored the use of LLMs to generate natural language explanations of bugs. However, LLMs are known to hallucinate, and their explanations may be misleading or incorrect. To address this, we propose a method that guides LLMs to produce explanations grounded in formal program analysis. By combining the expressive power of LLMs with the correctness guarantees of formal analysis techniques, we aim to generate natural language explanations of software bugs that are both useful and reliable.


Masih Beigi Rizi, Fatemeh Khashei
UBC
"Quantifying Prompt Ambiguity in Large Language Models"
Ambiguous prompts are those that allow generating multiple semantically distinct valid answers. The degree of ambiguity can be measured by the size of this answer space, but measuring it is intractable. Generating the full answer space is challenging due to its vastness. Answers to ambiguous questions are often long-form and contain multiple pieces of information, referred to as atomic facts, which need to be decomposed. Assessing validity or semantic equivalence of atomic facts typically requires manual effort. To approximate ambiguity, we explore using uncertainty and confidence measures. Uncertainty reflects the variability in the model's responses and captures the difficulty of generating a definitive answer. Inspired by uncertainty measures, we generate the answers to the ambiguous prompt, covering as much answer space as possible and decompose them into distinct atomic facts. Confidence, in contrast, focuses on a single response and estimates the model’s internal belief in its correctness. We use confidence as a proxy to estimate the validity of each answer. That being said, most existing work on confidence and uncertainty focuses on short-form generations, making their methods insufficient for capturing the complexity and long-form nature of ambiguous questions. We try to address this gap in our work.
Our approach involves building a pipeline that begins by generating multiple answers to a given prompt, decomposing them into atomic facts, and clustering semantically equivalent responses to estimate the diversity of the answers. We then use confidence scores to approximate the likelihood of correctness for each answer, allowing us to further evaluate the degree of ambiguity for the questions. Our goal is to identify varying levels of ambiguity and ultimately support downstream applications by detecting potential ambiguities and providing a set of clarification suggestions.


Mohamad Chehade
University of Texas at Austin
LEVIS: Large Exact Verifiable Input Spaces for Neural Networks
We develop two algorithms for finding the largest verifiable input space for a given neural network in a regression or classification task.

 


< Back to Events