Mikkel Werling

_______________________________________________________

MSc in Cognitive Science
PhD @ University of Copenhagen

Abstract

Increasing Predictive Performance and Generalizability of AI in Healthcare Using Meta-learning and Federated Learning in an International Collaboration

In recent years, artificial intelligence has shown remarkable results in computer vision, natural language processing, and image generation. But in many domains within health, progress in predictive models has stagnated. Algorithms often show (1) low prediction accuracies and (2) poor generalizability beyond training data. Low prediction accuracies are largely the results of ubiquitous low-resource settings in health and an inability to incorporate data from different sources (e.g., different countries and different data modalities). The problem of generalizability is mainly due to algorithms being trained on data from a single site but rarely benchmarked on external data, leading to overfitting and vulnerability to data shifts.

In this project, we address the problem of low prediction accuracies and generalizability in the specific domain of chronic lymphocytic leukemia (CLL), where progress in prognostic models has stagnated.

We increase prediction accuracies by developing a novel meta-learning framework capable of handling multiple data modalities and multiple outcomes. This allows us to include multiple data sources as well as combine information from related diseases (multiple myeloma and lymphoma primarily) (Figure 1A), drastically reducing the number of samples needed for state-of-the-art performance.

We address the problem of generalizability by spearheading an international collaboration across four different countries. By combining federated learning with a model capable of domain adaptation, we overcome the issue of heterogeneity in the data from different countries thereby producing internationally robust results (Figure 2B). We establish a global benchmark, allowing us to assess the international generalizability of our model.

By providing a proof-of-concept of the value of learning from multiple diseases, we revolutionize how we think about patient data in health. Using CLL as a litmus test, this project will generate a roadmap for overcoming some of the biggest barriers in health machine learning (hML) and achieving state-of-the-art performance even in low-resource domains.