
Findings of the Association for Computational Linguistics - EMNLP 2025
Promptception: How Sensitive Are Large Multimodal Models to Prompts?
Despite the success of Large Multimodal Models (LMMs) in recent years, prompt design for LMMs in Multiple-Choice Question Answering (MCQA) remains poorly understood. We show that even minor variations in prompt phrasing and structure can lead to accuracy deviations of up to 15% for certain prompts and models. This variability poses a challenge for transparent and fair LMM evaluation, as models often report their best-case performance using carefully selected prompts. To address this, we introduce Promptception, a systematic framework for evaluating prompt sensitivity in LMMs. It consists of 61 prompt types, spanning 15 categories and 6 supercategories, each targeting specific aspects of prompt formulation, and is used to evaluate 10 LMMs ranging from lightweight open-source models to GPT-4o and Gemini 1.5 Pro, across 3 MCQA benchmarks: MMStar, MMMU-Pro, and MVBench. Our findings reveal that proprietary models exhibit greater sensitivity to prompt phrasing, reflecting tighter alignment with instruction semantics, while open-source models are steadier but struggle with nuanced and complex phrasing. Based on this analysis, we propose Prompting Principles tailored to proprietary and open-source LMMs, enabling more robust and fair model evaluation.
Authors
Mohamed Insaf Ismithdeen, Muhammad Uzair Khattak, Salman Khan
Links
Contribution
Insaf led the evaluation pipeline - a 1,500+ run distributed inference harness on 20× NVIDIA A100s that orchestrated the full prompt grid across 10 models and 3 benchmarks in under two months, and built the results-aggregation layer that produced the per-category sensitivity findings.
@misc{ismithdeen2025promptceptionsensitivelargemultimodal,
title = {Promptception: How Sensitive Are Large Multimodal Models to Prompts?},
author = {Mohamed Insaf Ismithdeen and Muhammad Uzair Khattak and Salman Khan},
year = {2025},
eprint = {2509.03986},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2509.03986}
}