UDM-Workshop-2025

Multi-Modal Out-of-Distribution Detection with Large Language Models

Zhixia He, Chen Zhao, Minglai Shao, Dong Li, Qin Tian
Abstract: Vision-language models (VLMs) enhance image-based OOD detection by leveraging textual information to complement visual features, overcoming the limitations of traditional methods. While existing prompts in VLMs include uniform and learnable types, uniform prompts lack category-specific detail, and learnable negatives often capture only coarse non-ID features. To address this, we propose NPSOODD, a graph-based framework supervised by positive and negative prompts. We first use large language models (LLMs) to generate fine-grained, category-specific positive and negative prompts. These are optimized to capture intra-class semantics (positives) and inter-class boundaries (negatives). A Graph Neural Network (GNN) then fuses this semantic information into image features, enhancing an energy-based OOD detector. Experiments on CIFAR-100 and four OOD benchmarks with five LLMs show that NPSOODD outperforms state-of-the-art methods.

Adaptive Robust Optimization with Data-Driven Uncertainty for Enhancing Distribution System Resilience

Shuyi Chen, Shixiang Zhu, Ramteen Sioshansi
Abstract: Extreme weather increasingly threatens power systems, revealing the limits of reactive responses and underscoring the need for proactive resilience planning. Existing methods often oversimplify uncertainty and decouple proactive and reactive decisions. We propose a tri-level optimization framework that integrates infrastructure investment, adversarial spatio-temporal disruptions, and adaptive response. Using conformal prediction, we construct distribution-free uncertainty sets that capture complex, data-scarce outage patterns. To solve the nested problem, we derive a bi-level reformulation via strong duality and develop a scalable Benders decomposition algorithm. Experiments on real and synthetic data show our approach outperforms robust and two-stage baselines, achieving lower worst-case losses and more efficient resource use, particularly under tight constraints and large-scale uncertainty.

Quantifying Semantic Uncertainty in Large Language Models via Chain-of-Thought Reasoning

Tianhao Wang, Ali Riahi Samani, Rathang Rajpal, Naren Ramakrishnan, Feng Chen
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in generating fluent text and performing complex reasoning tasks. However, understanding when and why their outputs may be unreliable remains a critical challenge. In this work, we propose a novel approach to quantifying semantic uncertainty in LLM outputs by leveraging Chain-of-Thought (CoT) reasoning. Rather than relying on token-level confidence scores and/or multiple samples, our approach fine-tunes a lightweight LLM to predict binarized semantic entropy from exclusively the CoT reasoning trail and the initial prompt. This enables efficient, interpretable uncertainty estimation without requiring access to model internals or generating ensembles. We evaluate our method across three QA datasets—TriviaQA, SQuAD, and BioASQ—using multiple LLM backbones, and benchmark its performance against existing uncertainty estimation techniques. Additionally, we conduct extensive ablation studies to understand precisely why CoT reasoning effectively captures uncertainty. Our experiments demonstrate that CoT-based uncertainty estimation not only correlates well with output reliability but also enables downstream applications such as detecting model inconsistencies, understanding reasoning failures, and improving system-level decision-making strategies. By focusing on semantic uncertainty, our method contributes a practical and generalizable approach for making LLM outputs more transparent, trustworthy, and actionable.

Drift-Aware Proxy Uncertainty Estimation for Large Language Models in Temporal Streams

Haoliang Wang, Dong Li, Chen Zhao
Abstract: Large Language Models (LLMs) are increasingly deployed in real-world settings where data distributions evolve over time, ranging from dynamic knowledge domains such as finance and politics to rapidly shifting linguistic trends on social media. However, most existing Uncertainty Quantification (UQ) methods evaluate LLMs under static conditions, assuming a fixed distribution of prompts and outputs, and thus fail to account for temporal drift. This work introduces a novel framework for Drift-Aware Proxy Uncertainty (DAPU) estimation that captures temporal uncertainty signals using lightweight, label-free mechanisms. The proposed approach integrates semantic drift detection by measuring time-weighted embedding divergence from recent prompt history, leverages model output entropy to quantify response uncertainty, and incorporates a timestamp-based decay function to reflect the influence of input recency. These components are combined into a unified, time-sensitive uncertainty score that adapts to evolving semantics and contextual shifts in the input distribution. Experiments on temporally-split datasets for summarization and question answering demonstrate that DAPU more effectively identifies high-risk generations during periods of concept drift compared to conventional entropy and Monte Carlo dropout baselines. The proposed framework is model-agnostic, scalable, and readily deployable in real-world LLM applications without requiring access to ground-truth labels or internal model modifications.