This is a talk given by Prof Martin Schmidt as part of the Online Seminar Series Machine Learning NeEDS Mathematical Optimization, https://congreso.us.es/mlneedsmo/, branding the role of Operations Research in Artificial Intelligence with the support of EURO, https://www.euro-online.org.
The minimum sum-of-squares clustering (MSSC) problem is an important problem in data mining and (unsupervised) machine learning with very many applications in, e.g., medicine or social sciences. However, it is known to be NP-hard in all relevant cases and to be notoriously hard to be solved to global optimality in practice. Moreover, in many modern applications the clustering suffers from unstructured measurement errors because the MSSC result then represents a clustering of the erroneous measurements instead of the true but unknown underlying data.
In this talk, we discuss both mentioned issues. First, we present different tailored mixed-integer programming techniques to improve the performance of state-of-the-art MINLP solvers when applied to the problem – among them are cutting planes, propagation techniques, branching rules, or primal heuristics. Our numerical study shows that our techniques significantly improve the performance of the open-source MINLP solver SCIP. Second, we tackle the other issue by applying techniques from robust optimization to hedge the clustering result against unstructured errors in the observed data. To this end, we derive strictly and Gamma-robust counterparts. Since the nominal problem is already NP-hard, global approaches are often not feasible in practice. As a remedy, we develop tailored alternating direction methods by decomposing the search space of the robustified problems to quickly obtain feasible points of good quality. Our numerical results reveal an interesting feature: the less conservative Gamma-approach is clearly outperformed by the strictly robust clustering method. In particular, the strictly robustified clustering method is able to recover clusterings of the original data even if only erroneous measurements are observed.
Link to the talk: https://eu.bbcollab.com/guest/0a4bbb97d3fa49c589ec644496e0e769
The organizers of the Online Seminar Series Machine Learning NeEDS Mathematical Optimization
Emilio Carrizosa, IMUS-Instituto de Matemáticas de la Universidad de Sevilla
Nuria Gómez-Vargas, IMUS-Instituto de Matemáticas de la Universidad de Sevilla
Thomas Halskov, Copenhagen Business School
Dolores Romero Morales, Copenhagen Business School