# Available Master's thesis topics in machine learning

## Learning and inference with large Bayesian networks

Most learning and inference tasks with Bayesian networks are NP-hard. Therefore, one often resorts to using different heuristics that do not give any quality guarantees.

Task: Evaluate quality of large-scale learning or inference algorithms empirically.

Advisor: Pekka Parviainen

## Sum-product networks

Traditionally, probabilistic graphical models use a graph structure to represent dependencies and independencies between random variables. Sum-product networks are a relatively new type of a graphical model where the graphical structure models computations and not the relationships between variables. The benefit of this representation is that inference (computing conditional probabilities) can be done in linear time with respect to the size of the network.

Potential thesis topics in this area: a) Compare inference speed with sum-product networks and Bayesian networks. Characterize situations when one model is better than the other. b) Learning the sum-product networks is done using heuristic algorithms. What is the effect of approximation in practice?

Advisor: Pekka Parviainen

## Bayesian Bayesian networks

The naming of Bayesian networks is somewhat misleading because there is nothing Bayesian in them per se; A Bayesian network is just a representation of a joint probability distribution. One can, of course, use a Bayesian network while doing Bayesian inference. One can also learn Bayesian networks in a Bayesian way. That is, instead of finding an optimal network one computes the posterior distribution over networks.

Task: Develop algorithms for Bayesian learning of Bayesian networks (e.g., MCMC, variation inference, EM)

Advisor: Pekka Parviainen

## Large-scale (probabilistic) matrix factorization

The idea behind matrix factorization is to represent a large data matrix as a product of two or more smaller matrices.They are often used in, for example, dimensionality reduction and recommendation systems. Probabilistic matrix factorization methods can be used to quantify uncertainty in recommendations. However, large-scale (probabilistic) matrix factorization is computationally challenging.

Potential thesis topics in this area: a) Develop scalable methods for large-scale matrix factorization (non-probabilistic or probabilistic), b) Develop probabilistic methods for implicit feedback (e.g., recommmendation engine when there are no rankings but only knowledge whether a customer has bought an item)

Advisor: Pekka Parviainen

## Bayesian deep learning

Standard deep neural networks do not quantify uncertainty in predictions. On the other hand, Bayesian methods provide a principled way to handle uncertainty. Combining these approaches leads to Bayesian neural networks. The challenge is that Bayesian neural networks can be cumbersome to use and difficult to learn.

The task is to analyze Bayesian neural networks and different inference algorithms in some simple setting.

Advisor: Pekka Parviainen

## Deep learning for combinatorial problems

Deep learning is usually applied in regression or classification problems. However, there has been some recent work on using deep learning to develop heuristics for combinatorial optimization problems; see, e.g., [1] and [2].

Task:a) Choose a combinatorial problem (or several related problems) and develop deep learning methods to solve them.

References: [1] Vinyals, Fortunato and Jaitly: Pointer networks. NIPS 2015. [2] Dai, Khalil, Zhang, Dilkina and Song: Learning Combinatorial Optimization Algorithms over Graphs. NIPS 2017.

Advisor: Pekka Parviainen, Ahmad Hemmati

## Automatic hyperparameter selection for isomap

Isomap is a non-linear dimensionality reduction method with two free hyperparameters (number of nearest neighbors and neighborhood radius). Different hyperparameters result in dramatically different embeddings. Previous methods for selecting hyperparameters focused on choosing one optimal hyperparameter. In this project, you will explore the use of persistent homology to find parameter ranges that result in stable embeddings. The project has theoretic and computational aspects.

Advisor: Nello Blaser

## Directed cycle finding

Finding cycles in directed graphs is one of the subroutines in many algorithms for learning the structure of Bayesian networks. In this project, you will use methods from topological data analysis on directed graphs to find cycles more efficiently. Standard tools for finding cycles exist in the case of undirected graphs, and some recent work has focused on finding persistent homology of directed graphs. In this project, you will combine the two approaches to implement a method that finds cycles in directed graphs. You will then compare these methods with standard network methods in the context of Bayesian networks. This is an implementation project.

Advisor: Nello Blaser

## Notions of stability in machine learning

In topological data analysis, the term stability usually means that the output of an algorithm changes little, when the input is perturbed. In computational learning theory on the other hand, there are numerous definitions of stability, such as hypothesis stability, error stability or uniform stability. In this project, you will relate different definitions of stability to one-another, learn about stability of particular machine learning algorithms and develop the stability theory for persistent homology from a computational learning theory standpoint. This project is mostly theoretical.

Advisor: Nello Blaser

## Validate persistent homology

Persistent homology is a generalization of hierarchical clustering to find more structure than just the clusters. Traditionally, hierarchical clustering has been evaluated using resampling methods and assessing stability properties. In this project you will generalize these resampling methods to develop novel stability properties that can be used to assess persistent homology. This project has theoretic and computational aspects.

Advisor: Nello Blaser

## Persistent homology benchmarks

Persistent homology is becoming a standard method for analyzing data. In this project, you will to generate benchmark data sets for testing different aspects of the persistence pipeline. You will generate benchmarks for different objectives, such as data with known persistence diagram, where for example bottleneck distance can be minimized and data with classification and regression targets. Data sets will be sampled from a manifold with or without noise or from a general probability distribution. This project is mostly computational.

Advisor: Nello Blaser

## Divisive covers

Divisive covers are a divisive technique for generating filtered simplicial complexes. They original used a naive way of dividing data into a cover. In this project, you will explore different methods of dividing space, based on principle component analysis, support vector machines and k-means clustering. In addition, you will explore methods of using divisive covers for classification. This project will be mostly computational.

Advisor: Nello Blaser

## Reinforcement learning for sparsification

Reinforcement learning has recently become a way to heuristically solve optimization problems. In this project, you will set up the problem of finding a sparse approximation for persistent homology using the reinforcement framework. You will train a neural network to find approximations of simplicial complexes that can be smaller and more precise than traditional approximation techniques. The setup of the reinforcement problem requires a deep theoretic understanding, and the problem also has a computational aspect.

Advisor: Nello Blaser

## Topology of encodings

State of the art for natural language processing and facial recognition use vector embedding algorithms such as word2vec or Siamese networks. Classically, such vector embeddings are analyzed using cluster analysis or supervised methods. In this project, you will use network analysis and topological methods to analyze vector embeddings in order to find a richer description of vector embeddings. This project will be applied.

Advisor: Nello Blaser

## Multimodality in Bayesian neural network ensembles

One method to assess uncertainty in neural network predictions is to use dropout or noise generators at prediction time and run every prediction many times. This leads to a distribution of predictions. Informatively summarizing such probability distributions is a non-trivial task and the commonly used means and standard deviations result in the loss of crucial information, especially in the case of multimodal distributions with distinct likely outcomes. In this project, you will analyze such multimodal distributions with mixture models and develop ways to exploit such multimodality to improve training. This project can have theoretical, computational and applied aspects.

Advisor: Nello Blaser

## k-linkage clustering

Agglomerative clustering generally does not deal well with noise points. Single linkage clustering for example suffers from the chaining effect, while outliers have a strong effect on complete linkage clustering. In this project, you will study a version of agglomerative clustering that can take into account noise points and relate it to typical hierarchical clustering results as well as density-based methods, such as DBSCAN. This project can be theoretical, computational and applied.

Advisor: Nello Blaser

## Dimensionality reduction with missing data

Many dimensionality reduction methods assume that complete data is available. In particular for neighborhood graph algorithms, all distances should be computable. In this project, you will consider themajor neighborhood graph algorithms and extend them to the case of missing data. This will require detailed theoretic knowledge of the algorithms to define appropriate distance measures that preserve the desired properties. The project will have theoretic and computational aspects.

Advisor: Nello Blaser

## Multitask variational autoencoders

Autoencoders are a type of artificial neural network that to learn a data representation, typically for dimensionality reduction. Variational autoencoders are generative models that combine the autoencoder architectures with probabilistic graphical modeling. They may be used to restore damaged data by conditioning the decoder on the remaining data. In this project you will explore if joint training of a traditional variational autoencoder and restoring variational autoencoders can make the embedding more stable. The project will be mostly computational, but may have some theoretic aspects.

Advisor: Nello Blaser

## Topology of binary classifiers

A linear classifier separates the underlying space into two connected components. A nearest neighbor classifier on the other hand may divide the space into many connected components. Overfitting can result in dividing space into too many components. In this project, you will study how many connected components a different classifiers result in. We will then devise a regularization technique that penalizes many connected components. This project will have theoretical and computational aspects.

Advisor: Nello Blaser

## Topological neural networks

The aim of topological data analysis is to study the geometric and topological properties of data by combining machine learning with methods from algebraic topology. In recent years, several topological neural network layers have been proposed. However, their respective advantages and disadvantages have not been studied in detail. In this project, you will implement various topology layers and determine their respective strengths and weaknesses on numerous standard benchmark data sets. This project is mostly computational.

Advisor: Nello Blaser

## Neural Network Verification

Neural networks have been applied in many areas. However, any method based on generalizations may fail and this is by design. The question is how to deal with such failures. To limit them, one can define rules that a neural network should follow and devise strategies to verify whether the rules are obeyed. The main tasks of this project are to study an algorithm for learning rules formulated in propositional Horn, implement the algorithm, and apply it to verify neural networks.

References:

Queries and Concept Learning by Angluin (Machine Learning 1988)

Exact Learning: On the Boundary between Horn and CNF by Hermo and Ozaki (ACM TOCT 2020).

Formal Verification of Deep Neural Networks by Narodytska (FMCAD 2018).

Advisor: Ana Ozaki

## Knowledge Graph Embeddings

Knowledge graphs can be understood as labelled graphs whose nodes and edges are enriched with meta-knowledge, such as temporal validity, geographic coordinates, and provenance. Recent research in machine learning attempts to complete (or predict) facts in a knowledge graph by embedding entities and relations in low-dimensional vector spaces. The main tasks of this project are to study knowledge graph embeddings, study ways of integrating temporal validity in the geometrical model of a knowledge graph, implement and perform tests with an embedding that represents the temporal evolution of entities using their vector representations.

References:

Translating Embeddings for Modeling Multi-relational Data by Bordes, Usunier, Garcia-Durán (NeurIPS 2013)

Temporally Attributed Description Logics by Ozaki, Krötzsch, Rudolph (Book chapter: Description Logic, Theory Combination, and All That 2019)

Attributed Description Logics: Reasoning on Knowledge Graphs by Krötzsch, Marx, Ozaki, Thost (ISWC 2017)

Advisor: Ana Ozaki

## Decidability and Complexity of Learning

Gödel showed in 1931 that, essentially, there is no consistent and complete set of axioms that is capable of modelling traditional arithmetic operations. Recently, Ben-David et al. defined a general learning model and showed that learnability in this model may not be provable using the standard axioms of mathematics. The main tasks of this project are to study Gödel's incompleteness theorems, the connection between these theorems and the theory of machine learning, and to investigate learnability and complexity classes in the PAC and the exact learning models.

References:

Learnability can be undecidable by Ben-David, Hrubeš, Moran, Shpilka, Yehudayoff (Nature 2019)

On the Complexity of Learning Description Logic Ontologies by Ozaki (RW 2020)

Advisor: Ana Ozaki

## Machine Ethics

Autonomous systems, such as self-driving cars, need to behave according to the environment in which they are embedded. However, ethical and moral behaviour is not universal and it is often the case that the underlying behaviour norms change among countries or groups of countries and a compromise among such differences needs to be considered.

The moral machines experiment (https://www.moralmachine.net/) exposed people to a series of moral dilemmas and asked people what should an autonomous vehicle do in each of the given situations. Researchers then tried to find similarities between the answers from the same region.

The main tasks of this project are to study the moral machine experiment, study and implement an algorithm for building compromises among different regions (or even people). We have developed a compromise building algorithm that works on behavioural norms represented as Horn clauses. Assume that each choice example from the moral machines experiment is behavioural norm represented as a Horn clause. The compromise algorithm is applied to these choices obtained from different people during the moral machines experiment. One of the goals of this project would be to determine how to (efficiently) compute compromises for groups of countries (e.g., the Nordic Countries and Scandinavia).

References:

The Moral Machine experiment by Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan (Nature 2018)

Advisors: Ana Ozaki, Marija Slavkovik

## Learning Ontologies via Queries

In artificial intelligence, ontologies have been used to represent knowledge about a domain of interest in a machine-processable format. However, designing and maintaining ontologies is an expensive process that often requires the interaction between ontology engineers and domain experts. The main tasks of this project are to study an algorithm for learning ontologies formulated in the ELH description logic, implement the algorithm, and evaluate it using an artificial oracle developed in the literature that simulates the domain expert.

References:

Learning Query Inseparable ELH ontologies by Ozaki, Persia, Mazzullo (AAAI 2020)

ExactLearner: A Tool for Exact Learning of EL Ontologies by Duarte, Konev, Ozaki (KR 2018)

Exact Learning of Lightweight Description Logic Ontologies by Konev, Lutz, Ozaki, Wolter (JMLR 2018)

Advisor: Ana Ozaki

## Mining Ontologies with Formal Concept Analysis

Formal Concept Analysis (FCA) is a method of data analysis that can be used to find implications that hold in a dataset (e.g., Chancellor -> Politician, meaning "a chancellor is a politician"). In FCA, a base for a dataset is a set of implications with minimal cardinality that characterize those implications that hold in the dataset. This notion can be adapted to the context of ontologies, where the base contains rules formulated in an ontology language instead of implications. The main tasks of this project are to study an algorithm for mining ontologies based on FCA, implement the algorithm, and evaluate it using portions of knowledge graphs such as Wikidata as datasets.

References:

Mining of EL-GCIs by Borchmann and Distel (ICDMW 2011)

Learning Description Logic Ontologies. Five Approaches. Where Do They Stand? by Ozaki (KI 2020)

Advisor: Ana Ozaki

## Simulated underwater environment and deep learning

Using data from the Mareano surveys or the LoVe underwater observatory, create a simulator for underwater benthic (i.e. sea bed) scenes by placing objects randomly (but credibly) on a background. Using the simulated data, train deep learning neural networks to:

a) recognize presence of specific objects b) locate specific objects c) segment specific objects

Test the systems on real data and evaluate the results.

Advisor: Ketil Malde

## Simulated 3D environment and fish species identification

Using data from the Deep Vision trawl camera, implement a simulator that mimics the stereoscopic camera setup. Place a background image and fish (either as 3D models, or as 2D "cardboard" images) of various species in random orientations, and generate two images representing the viewpoints of the two cameras. Using simulated data, train a deep learning neural network to reconstruct the 3D scene, including estimating the species, lenghts, and weight/volumes of individual fish.

Advisor: Ketil Malde

## Evaluating the effects and interaction of hyperparameters in convolutional neural networks

Neural networks have many hyperparameters, including choice of activation functions, regularization and normalization, gradient descent method, early stopping, cost function, and so on. While best practices exist, the interactions between the different choices can be hard to predict. To study this, train networks on suitable benchmark data, using randomized choices for hyperparameters, and observe parameters like rate of convergence, over- and underfitting, magnitude of gradient, and final accuracy.

Advisor: Ketil Malde

## Machine teaching

Advisor: Jan Arne Telle

## Machine learning approaches toward personalized treatment of leukemia

With new data on multiple omics level reveal more information on leukemia and the effect of drugs, there are new opportunities to tailor treatment to each individual patient. In an on-going European project we study leukemia and use data both from individual patients and from cell line and mouse model systems to improve the understanding of genomic clonality, signaling pathway status aiming to generate data enabling machine learning approaches to predict prognosis and treatment response. The focus of the project will be on setting up an appropriate software system enabling evaluation of alternative feature selection methods and classification approaches. There is an opportunity to work tightly with bioinformatics, systems biology and cancer researchers in the above mentioned European project including partners in Germany and the Netherlands and also with the Centre of Excellence CCBIO (Center for Cancer Biomarkers) in Bergen.

Advisor: Inge Jonassen

## Developing Bayesian network inference algorithms for modelling genetic effects on gene expression

Advisor: Tom Michoel

## Deep learning approaches in imaging genetics

Advisor: Tom Michoel

## Towards precision medicine for cancer patient stratification

On average, a drug or a treatment is effective in only about half of patients who take it. This means patients need to try several until they find one that is effective at the cost of side effects associated with every treatment. The ultimate goal of precision medicine is to provide a treatment best suited for every individual. Sequencing technologies have now made genomics data available in abundance to be used towards this goal.

In this project we will specifically focus on cancer. Most cancer patients get a particular treatment based on the cancer type and the stage, though different individuals will react differently to a treatment. It is now well established that genetic mutations cause cancer growth and spreading and importantly, these mutations are different in individual patients. The aim of this project is use genomic data allow to better stratification of cancer patients, to predict the treatment most likely to work. Specifically, the project will use machine learning approach to integrate genomic data and build a classifier for stratification of cancer patients.

Advisor: Anagha Joshi

## Unraveling gene regulation from single cell data

Multi-cellularity is achieved by precise control of gene expression during development and differentiation and aberrations of this process leads to disease. A key regulatory process in gene regulation is at the transcriptional level where epigenetic and transcriptional regulators control the spatial and temporal expression of the target genes in response to environmental, developmental, and physiological cues obtained from a signalling cascade. The rapid advances in sequencing technology has now made it feasible to study this process by understanding the genomewide patterns of diverse epigenetic and transcription factors as well as at a single cell level.

Single cell RNA sequencing is highly important, particularly in cancer as it allows exploration of heterogenous tumor sample, obstructing therapeutic targeting which leads to poor survival. Despite huge clinical relevance and potential, analysis of single cell RNA-seq data is challenging. In this project, we will develop strategies to infer gene regulatory networks using network inference approaches (both supervised and un-supervised). It will be primarily tested on the single cell datasets in the context of cancer.

Advisor: Anagha Joshi

## Analysing Nanopore data

Oxford Nanopore technologies recently revolutionised Next Generation Sequencing by developing a portable Long-read sequencing device. At its core, these Sequencers yous Protein pores suspended in lipid membranes, that allow the passage of DNA or RNA. During the passage, changes in the electrical current over the membrane can be decoded towards the nucleotide sequence of the passing molecule.This so-called Squiggle data is rich in information about the native DNA/RNA molecule and can be queried for nucleotide modifications, unnatural bases and physical properties of the underlying biomolecule.

In the proposed Master projects we want to make use of the information retained in Nanopore squiggle data to design specific biological experiments allowing deep insights in RNA and DNA biology. Project A makes use of already recorded data in which a fifth base - Inosine - was introduced in a specific context in sequenced DNA. The software used to decode squiggle data is trained to recognise only canonical bases - A, G, C, T - by use of recurrent neural network analysis. The student would apply RNN analysis to train a 5-base decoder building up on established structures, or will be involved in developing HMM tools to identify Inosine in data after decoding with standard RNN models. The successful analysis will then be applied to find Inosine bases integrated into natural DNA based on specific experimental design.

Project B will screen for physical properties of native DNA. Oxford Nanopore Sequencing allows for the probing of native DNA molecules without prior copying in vitro. This allows to screen for properties of the native DNA as well. Depending on sequence context, DNA can form different double-helical structures. Next to A- and B-DNA, Z-DNA is thought to be a seldom active DNA structure that forms a left-handed zigzag double-helix. We propose that Z-DNA has distinct physical properties that affect the processivity of the proteins involved in Nanopore Sequencing. Therefore changes to the squiggle data will record the occurrence of Z-DNA in native genomic sequences. The student will apply Machine learning approaches to investigate the occurrence of Z-DNA in genomic sequencing experiments and will establish squiggle data properties that differentiate Z-DNA from other forms of DNA. This will lead to the development of a stand-alone tool to investigate physical properties of ONT data.

Advisor: Maximilian Krause

## Developing a Stress Granule Classifier

To carry out the multitude of functions 'expected' from a human cell, the cell employs a strategy of division of labour, whereby sub-cellular organelles carry out distinct functions. Thus we traditionally understand organelles as distinct units defined both functionally and physically with a distinct shape and size range. More recently a new class of organelles have been discovered that are assembled and dissolved on demand and are composed of liquid droplets or 'granules'. Granules show many properties characteristic of liquids, such as flow and wetting, but they can also assume many shapes and indeed also fluctuate in shape. One such liquid organelle is a stress granule (SG).

Stress granules are pro-survival organelles that assemble in response to cellular stress and important in cancer and neurodegenerative diseases like Alzheimer's. They are liquid or gel-like and can assume varying sizes and shapes depending on their cellular composition.

In a given experiment we are able to image the entire cell over a time series of 1000 frames; from which we extract a rough estimation of the size and shape of each granule. Our current method is susceptible to noise and a granule may be falsely rejected if the boundary is drawn poorly in a small majority of frames. Ideally, we would also like to identify potentially interesting features, such as voids, in the accepted granules.

We are interested in applying a machine learning approach to develop a descriptor for a 'classic' granule and furthermore classify them into different functional groups based on disease status of the cell. This method would be applied across thousands of granules imaged from control and disease cells. We are a multi-disciplinary group consisting of biologists, computational scientists and physicists.

Advisor: Sushma Grellscheid, Carl Jones

## Machine Learning based Hyperheuristic algorithm

Develop a Machine Learning based Hyper-heuristic algorithm to solve a pickup and delivery problem. A hyper-heuristic is a heuristics that choose heuristics automatically. Hyper-heuristic seeks to automate the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems [Handbook of Metaheuristics]. There might be multiple heuristics for solving a problem. Heuristics have their own strength and weakness. In this project, we want to use machine-learning techniques to learn the strength and weakness of each heuristic while we are using them in an iterative search for finding high quality solutions and then use them intelligently for the rest of the search. Once a new information is gathered during the search the hyper-heuristic algorithm automatically adjusts the heuristics.

Advisor: Ahmad Hemmati

## Machine learning for solving satisfiability problems and applications in cryptanalysis

Advisor: Igor Semaev

## Applying machine learning algorithms in post-quantum cryptanalysis.

Post-quantum cryptography is a hot area in the recent cryptographic research, aiming for building secure public-key cryptosystems even if quantum computers are assumed. One important event in this filed is the standardization effort from NIST, namely, the NIST PQC competition.

The research topic is applying machine learning techniques to improve the cryptanalysis on the NIST PQC candidates. The basic problem is to study the distribution of the huge amount of data generated during the cryptanalysis process that are difficult to represent in an analytic expression.

The student will gain experiences in the crypto and security research and improve his/her programming skills. The results could be a worthwhile contribution for some crypto conferences or/and journals.

Reference:1. Yu, Yang, and Léo Ducas. "Learning strikes again: the case of the DRS signature scheme." International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT). Springer, Cham, 2018.

Advisor: Qian Guo

## Homomorphic Encryption for Machine Learning

Advisor: Chunlei Li

## Deep Learning for Channel Coding in 5G Mobile Communication

Advisor: Chunlei Li

## Explainability/visualization for citizen view

Project background

Advanced machine learning algorithms are powerful in their capability for making accurate predictions even in situations where the underlying relations between predictors are subtle and non-linear, but at the same time hard to interpret and hence often referred to as “black box” algorithms. This makes them on the one hand potentially highly valuable for business applications, but on the other, if their reasoning cannot be understood by decision makers or outcomes cannot be explained to end-users, difficult to put to use.

These end-users can for instance be citizens directly affected by the algorithm’s decision (e.g. tax based on predicted property value, conditions for loan based on predicted risk profile, etc), or business leaders without technical background who make decisions based partly on the algorithm’s output. Based on this, we coin a non-technical explanation of an algorithm’s decision a “citizen view”, meant to indicate the explanation’s technical level, though not necessarily the end-user’s role.

Project goal

PricewaterhouseCoopers wish to provide an MSc student with a dataset representing a real business case, where the aim is either classification or regression.

The goal is to develop a standardised “citizen view”, through which the output of advanced machine learning algorithms in terms of its input can be explained to the average non-technical person, along with a recommended workflow for achieving this generically for similar problems.

The student should develop both advanced and explainable machine learning regression or classification models. The advanced model(s) can be e.g. a deep neural network or an XGBoost model, where the aim is achieving the highest possible prediction or classification accuracy, while the explainable model can be e.g. a linear regression or a shallow tree model. The student should then analyse the behaviour of the advanced model, with the aim of explaining the behaviour as accurately as possible, in terms of an explainable model. The presentation of the explanation should be two-fold: Once in technical terms, using statistical concepts, and once in the “citizen view”. The latter should explain the main drivers behind the algorithm’s decision, including a satisfactory level of quantitative detail while not obfuscating the qualitative behaviour. Furthermore, the explanation should not reveal the model’s underlying data, model parameters or other trade secrets.

Topics

- Advanced machine learning algorithms such as neural networks, XGBoost, CatBoost, ...
- Standard statistical prediction or classification algorithms, such as linear regression or tree models
- SHAP and possibly also LIME algorithms, for analysing and explaining model behaviour
- Visualisation of data, variable relations and possibly principal component analyses.

Advisor: Inga Strümke

## Own topic combining logic and learning

If you want to suggest your own topic combining logic and learning, please contact Ana Ozaki

## Own topic

If you want to suggest your own topic, please contact Pekka Parviainen