Learning from multi-modal data: integration, fusion, and data translation
Speaker: Samuel Kaski, Helsinki Institute for Information Technology HIIT, Aalto University and University of Helsinki.
Abstract: In data analysis tasks, across fields from genomics to multimodal interfaces, one of the most needed operations is data integration or data fusion. For the goal of making sense of the data, the different very high-dimensional data sources give different but complementary information. In a case study in genomics, the sources include gene expression in different diseases and under different treatments, metabolite concentrations, DNA copy number variation etc. Given the large number of data sources with mostly unknown connections, it may be more appropriate to talk about data translation than integration, with the goal being to find, characterize, and utilize the unknown connections between data sources. In machine learning this task has been called unsupervised multi-view machine learning, for which we have introduced Bayesian canonical correlation analysis-based methods, and recently Group Factor Analysis (GFA) which generalizes factor analysis from analysing relationships of univariate variables to analysis of multiple data sources each consisting of multivariate observations. I will discuss the methods and present case studies in metabolomics and in analysing genome-wide effects of drugs.
NB: Refreshments will be served in the lobby before the seminar.