Human perception is inherently uncertainty-aware (we naturally adapt our decisions based on how confident we are in our own understanding) and multimodal (we seldom rely on a single source of information). Analogously, we argue that trustworthy computer vision systems should (at the very least) (1) express an appropriate level of uncertainty, such that we can reliably identify and understand their mistakes and (2) leverage multiple complementary sources of information, in order to be sufficiently well-informed. While state-of-the-art deep neural networks (DNNs) hold great potential across a wide range of image understanding problems, they offer little to no performance guarantees at run-time when fed data which deviates from their training distribution.
Reliably quantifying their predictive uncertainty in complex multimodal computer vision tasks remains an open research problem, yet will be a necessity for widespread adoption in safety-critical applications. The aim of this project is therefore to conduct basic research in probabilistic deep learning and computer vision, investigating how uncertainty can be modelled and extracted in multimodal DNNs for image classification and segmentation.
We will adopt approximate Bayesian inference methods to separately capture data uncertainty and model uncertainty not only at the final prediction stage, but also in the intermediate feature fusion process, in order to adaptively weigh the contribution of each modality. We will develop novel uncertainty-aware deep fusion methods, and study them on real-world computer vision tasks across a broad range of high-stakes domains including multimodal medical image analysis. Our research will be an important step towards improving the transparency and robustness of modern neural networks and fulfilling their potential as safe, trustworthy decision-support tools.