# Available Master's thesis topics in machine learning

## Learning and inference with large Bayesian networks

Most learning and inference tasks with Bayesian networks are NP-hard. Therefore, one often resorts to using different heuristics that do not give any quality guarantees.

Task: Evaluate quality of large-scale learning or inference algorithms empirically.

Advisor: Pekka Parviainen

## Sum-product networks

Traditionally, probabilistic graphical models use a graph structure to represent dependencies and independencies between random variables. Sum-product networks are a relatively new type of a graphical model where the graphical structure models computations and not the relationships between variables. The benefit of this representation is that inference (computing conditional probabilities) can be done in linear time with respect to the size of the network.

Potential thesis topics in this area: a) Compare inference speed with sum-product networks and Bayesian networks. Characterize situations when one model is better than the other. b) Learning the sum-product networks is done using heuristic algorithms. What is the effect of approximation in practice?

Advisor: Pekka Parviainen

## Bayesian Bayesian networks

The naming of Bayesian networks is somewhat misleading because there is nothing Bayesian in them per se; A Bayesian network is just a representation of a joint probability distribution. One can, of course, use a Bayesian network while doing Bayesian inference. One can also learn Bayesian networks in a Bayesian way. That is, instead of finding an optimal network one computes the posterior distribution over networks.

Task: Develop algorithms for Bayesian learning of Bayesian networks (e.g., MCMC, variational inference, EM)

Advisor: Pekka Parviainen

## Large-scale (probabilistic) matrix factorization

The idea behind matrix factorization is to represent a large data matrix as a product of two or more smaller matrices.They are often used in, for example, dimensionality reduction and recommendation systems. Probabilistic matrix factorization methods can be used to quantify uncertainty in recommendations. However, large-scale (probabilistic) matrix factorization is computationally challenging.

Potential thesis topics in this area: a) Develop scalable methods for large-scale matrix factorization (non-probabilistic or probabilistic), b) Develop probabilistic methods for implicit feedback (e.g., recommmendation engine when there are no rankings but only knowledge whether a customer has bought an item)

Advisor: Pekka Parviainen

## Bayesian deep learning

Standard deep neural networks do not quantify uncertainty in predictions. On the other hand, Bayesian methods provide a principled way to handle uncertainty. Combining these approaches leads to Bayesian neural networks. The challenge is that Bayesian neural networks can be cumbersome to use and difficult to learn.

The task is to analyze Bayesian neural networks and different inference algorithms in some simple setting.

Advisor: Pekka Parviainen

## Deep learning for combinatorial problems

Deep learning is usually applied in regression or classification problems. However, there has been some recent work on using deep learning to develop heuristics for combinatorial optimization problems; see, e.g., [1] and [2].

Task: Choose a combinatorial problem (or several related problems) and develop deep learning methods to solve them.

References: [1] Vinyals, Fortunato and Jaitly: Pointer networks. NIPS 2015. [2] Dai, Khalil, Zhang, Dilkina and Song: Learning Combinatorial Optimization Algorithms over Graphs. NIPS 2017.

Advisor: Pekka Parviainen, Ahmad Hemmati

## Automatic hyperparameter selection for isomap

Isomap is a non-linear dimensionality reduction method with two free hyperparameters (number of nearest neighbors and neighborhood radius). Different hyperparameters result in dramatically different embeddings. Previous methods for selecting hyperparameters focused on choosing one optimal hyperparameter. In this project, you will explore the use of persistent homology to find parameter ranges that result in stable embeddings. The project has theoretic and computational aspects.

Advisor: Nello Blaser

## Directed cycle finding

Finding cycles in directed graphs is one of the subroutines in many algorithms for learning the structure of Bayesian networks. In this project, you will use methods from topological data analysis on directed graphs to find cycles more efficiently. Standard tools for finding cycles exist in the case of undirected graphs, and some recent work has focused on finding persistent homology of directed graphs. In this project, you will combine the two approaches to implement a method that finds cycles in directed graphs. You will then compare these methods with standard network methods in the context of Bayesian networks. This is an implementation project.

Advisor: Nello Blaser

## Notions of stability in machine learning

In topological data analysis, the term stability usually means that the output of an algorithm changes little, when the input is perturbed. In computational learning theory on the other hand, there are numerous definitions of stability, such as hypothesis stability, error stability or uniform stability. In this project, you will relate different definitions of stability to one-another, learn about stability of particular machine learning algorithms and develop the stability theory for persistent homology from a computational learning theory standpoint. This project is mostly theoretical.

Advisor: Nello Blaser

## Validate persistent homology

Persistent homology is a generalization of hierarchical clustering to find more structure than just the clusters. Traditionally, hierarchical clustering has been evaluated using resampling methods and assessing stability properties. In this project you will generalize these resampling methods to develop novel stability properties that can be used to assess persistent homology. This project has theoretic and computational aspects.

Advisor: Nello Blaser

## Persistent homology benchmarks

Persistent homology is becoming a standard method for analyzing data. In this project, you will to generate benchmark data sets for testing different aspects of the persistence pipeline. You will generate benchmarks for different objectives, such as data with known persistence diagram, where for example bottleneck distance can be minimized and data with classification and regression targets. Data sets will be sampled from a manifold with or without noise or from a general probability distribution. This project is mostly computational.

Advisor: Nello Blaser

## Divisive covers

Divisive covers are a divisive technique for generating filtered simplicial complexes. They original used a naive way of dividing data into a cover. In this project, you will explore different methods of dividing space, based on principle component analysis, support vector machines and k-means clustering. In addition, you will explore methods of using divisive covers for classification. This project will be mostly computational.

Advisor: Nello Blaser

## Reinforcement learning for sparsification

Reinforcement learning has recently become a way to heuristically solve optimization problems. In this project, you will set up the problem of finding a sparse approximation for persistent homology using the reinforcement framework. You will train a neural network to find approximations of simplicial complexes that can be smaller and more precise than traditional approximation techniques. The setup of the reinforcement problem requires a deep theoretic understanding, and the problem also has a computational aspect.

Advisor: Nello Blaser

## Topology of encodings

State of the art for natural language processing and facial recognition use vector embedding algorithms such as word2vec or Siamese networks. Classically, such vector embeddings are analyzed using cluster analysis or supervised methods. In this project, you will use network analysis and topological methods to analyze vector embeddings in order to find a richer description of vector embeddings. This project will be applied.

Advisor: Nello Blaser

## Multimodality in Bayesian neural network ensembles

One method to assess uncertainty in neural network predictions is to use dropout or noise generators at prediction time and run every prediction many times. This leads to a distribution of predictions. Informatively summarizing such probability distributions is a non-trivial task and the commonly used means and standard deviations result in the loss of crucial information, especially in the case of multimodal distributions with distinct likely outcomes. In this project, you will analyze such multimodal distributions with mixture models and develop ways to exploit such multimodality to improve training. This project can have theoretical, computational and applied aspects.

Advisor: Nello Blaser

## k-linkage clustering

Agglomerative clustering generally does not deal well with noise points. Single linkage clustering for example suffers from the chaining effect, while outliers have a strong effect on complete linkage clustering. In this project, you will study a version of agglomerative clustering that can take into account noise points and relate it to typical hierarchical clustering results as well as density-based methods, such as DBSCAN. This project can be theoretical, computational and applied.

Advisor: Nello Blaser

## Dimensionality reduction with missing data

Many dimensionality reduction methods assume that complete data is available. In particular for neighborhood graph algorithms, all distances should be computable. In this project, you will consider themajor neighborhood graph algorithms and extend them to the case of missing data. This will require detailed theoretic knowledge of the algorithms to define appropriate distance measures that preserve the desired properties. The project will have theoretic and computational aspects.

Advisor: Nello Blaser

## Multitask variational autoencoders

Autoencoders are a type of artificial neural network that to learn a data representation, typically for dimensionality reduction. Variational autoencoders are generative models that combine the autoencoder architectures with probabilistic graphical modeling. They may be used to restore damaged data by conditioning the decoder on the remaining data. In this project you will explore if joint training of a traditional variational autoencoder and restoring variational autoencoders can make the embedding more stable. The project will be mostly computational, but may have some theoretic aspects.

Advisor: Nello Blaser

## Topology of binary classifiers

A linear classifier separates the underlying space into two connected components. A nearest neighbor classifier on the other hand may divide the space into many connected components. Overfitting can result in dividing space into too many components. In this project, you will study how many connected components a different classifiers result in. We will then devise a regularization technique that penalizes many connected components. This project will have theoretical and computational aspects.

Advisor: Nello Blaser

## Topological neural networks

The aim of topological data analysis is to study the geometric and topological properties of data by combining machine learning with methods from algebraic topology. In recent years, several topological neural network layers have been proposed. However, their respective advantages and disadvantages have not been studied in detail. In this project, you will implement various topology layers and determine their respective strengths and weaknesses on numerous standard benchmark data sets. This project is mostly computational.

Advisor: Nello Blaser

## Automating the collection of evidence for medical interventions inlow-income countries

In this project, you will use the state of the art natural languageprocessing tools, such as latent semantic analysis, TextRank, andrecurrent neural networks to develop a reliable system that scans themedical literature, extracts and ranks the most relevant papers forspecific medical interventions. Bergen Center for Ethics and PrioritySetting develops software that helps low-income countries (startingwith Ethiopia, Malawi, and Zanzibar) doing priority setting whenallocating their health budgets. This software requires input on costand effects of all health interventions underconsideration. Currently, we focus on 218 interventions that areconsidered the most important, but in time the plan is to scale up tothousands of interventions. In this project you will develop an AIthat scans the medical literature and ranks papers according torelevance for each intervention. Such a tool could have a great impacton the health of people living in countries that cannot afford to havea large number of experts conduct clinical trials or conductliterature reviews.

Advisor: Nello Blaser, Øystein Haaland

## Artificial intelligence to segment fish from a 3D-camera

The student will develop a neural network based on the newest methodsto segment fish on 3D pictures taken by a ROV or stationarycamera. So-called segmantic segmentation will be used to recognizewhat's in the image at the pixel level. The data will be used forautomatic length estimation.

Advisor: Nello Blaser, Magnus Rogne Myklebost

## Neural Network Verification

Neural networks have been applied in many areas. However, any method based on generalizations may fail and this is by design. The question is how to deal with such failures. To limit them, one can define rules that a neural network should follow and devise strategies to verify whether the rules are obeyed. The main tasks of this project are to study an algorithm for learning rules formulated in propositional Horn, implement the algorithm, and apply it to verify neural networks.

References:

Queries and Concept Learning by Angluin (Machine Learning 1988)

Exact Learning: On the Boundary between Horn and CNF by Hermo and Ozaki (ACM TOCT 2020).

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples by Weiss, Goldberg, Yahav (ICML 2018)

Advisor: Ana Ozaki

## Knowledge Graph Embeddings

Knowledge graphs can be understood as labelled graphs whose nodes and edges are enriched with meta-knowledge, such as temporal validity, geographic coordinates, and provenance. Recent research in machine learning attempts to complete (or predict) facts in a knowledge graph by embedding entities and relations in low-dimensional vector spaces. The main tasks of this project are to study knowledge graph embeddings, study ways of integrating temporal validity in the geometrical model of a knowledge graph, implement and perform tests with an embedding that represents the temporal evolution of entities using their vector representations.

References:

Translating Embeddings for Modeling Multi-relational Data by Bordes, Usunier, Garcia-Durán (NeurIPS 2013)

Temporally Attributed Description Logics by Ozaki, Krötzsch, Rudolph (Book chapter: Description Logic, Theory Combination, and All That 2019)

Attributed Description Logics: Reasoning on Knowledge Graphs by Krötzsch, Marx, Ozaki, Thost (ISWC 2017)

Advisor: Ana Ozaki

## Decidability and Complexity of Learning

Gödel showed in 1931 that, essentially, there is no consistent and complete set of axioms that is capable of modelling traditional arithmetic operations. Recently, Ben-David et al. defined a general learning model and showed that learnability in this model may not be provable using the standard axioms of mathematics. The main tasks of this project are to study Gödel's incompleteness theorems, the connection between these theorems and the theory of machine learning, and to investigate learnability and complexity classes in the PAC and the exact learning models.

References:

Learnability can be undecidable by Ben-David, Hrubeš, Moran, Shpilka, Yehudayoff (Nature 2019)

On the Complexity of Learning Description Logic Ontologies by Ozaki (RW 2020)

Advisor: Ana Ozaki

## Machine Ethics

Autonomous systems, such as self-driving cars, need to behave according to the environment in which they are embedded. However, ethical and moral behaviour is not universal and it is often the case that the underlying behaviour norms change among countries or groups of countries and a compromise among such differences needs to be considered.

The moral machines experiment (https://www.moralmachine.net/) exposed people to a series of moral dilemmas and asked people what should an autonomous vehicle do in each of the given situations. Researchers then tried to find similarities between the answers from the same region.

The main tasks of this project are to study the moral machine experiment, study and implement an algorithm for building compromises among different regions (or even people). We have developed a compromise building algorithm that works on behavioural norms represented as Horn clauses. Assume that each choice example from the moral machines experiment is behavioural norm represented as a Horn clause. The compromise algorithm is applied to these choices obtained from different people during the moral machines experiment. One of the goals of this project would be to determine how to (efficiently) compute compromises for groups of countries (e.g., the Nordic Countries and Scandinavia).

References:

The Moral Machine experiment by Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan (Nature 2018)

Advisors: Ana Ozaki, Marija Slavkovik

## Prediction of hyper-congestion in transport systems

This project is in collaboration with colleagues at Tøi (Institute of Transport Economics).

They have real-world data for training a NN (e.g., traffic counts, average speed measures, etc). The main tasks in this project are to analyse traffic data, perform simulation runs with the tool MATSim (to systematically scale up population and road capacity), train a NN on each data set separately and combined data sets (both real-world and simulation data), define and check whether the trained model satisfies basic constraints regarding the traffic flow.

Advisors: Ana Ozaki, Aino Ukkonen

## Mining Ontologies with Formal Concept Analysis

Formal Concept Analysis (FCA) is a method of data analysis that can be used to find implications that hold in a dataset (e.g., Chancellor -> Politician, meaning "a chancellor is a politician"). In FCA, a base for a dataset is a set of implications with minimal cardinality that characterize those implications that hold in the dataset. This notion can be adapted to the context of ontologies, where the base contains rules formulated in an ontology language instead of implications. The main tasks of this project are to study an algorithm for mining ontologies based on FCA, implement the algorithm, and evaluate it using portions of knowledge graphs such as Wikidata as datasets.

References:

Mining of EL-GCIs by Borchmann and Distel (ICDMW 2011)

Learning Description Logic Ontologies. Five Approaches. Where Do They Stand? by Ozaki (KI 2020)

Advisor: Ana Ozaki

## Simulated underwater environment and deep learning

Using data from the Mareano surveys or the LoVe underwater observatory, create a simulator for underwater benthic (i.e. sea bed) scenes by placing objects randomly (but credibly) on a background. Using the simulated data, train deep learning neural networks to:

a) recognize presence of specific objects b) locate specific objects c) segment specific objects

Test the systems on real data and evaluate the results.

Advisor: Ketil Malde

## Evaluating the effects and interaction of hyperparameters in convolutional neural networks

Neural networks have many hyperparameters, including choice of activation functions, regularization and normalization, gradient descent method, early stopping, cost function, and so on. While best practices exist, the interactions between the different choices can be hard to predict. To study this, train networks on suitable benchmark data, using randomized choices for hyperparameters, and observe parameters like rate of convergence, over- and underfitting, magnitude of gradient, and final accuracy.

Advisor: Ketil Malde

## Machine learned short-timescale prediction of inflow to rivers and dams

Knowledge about the inflow of water to rivers and dams is of great importance for managing hydroelectric power plants, not only for power generation, but also for flood management and prevention. In case of the latter, a system for accurately predicting the instantaneous inflow (on a minute scale) based on past, current, and imminent weather, would be of great value for dam operators as well as downstream communities.The task is to design, implement and train such a "short time inflow forecast" machine learning stack for (part of) a hydrological network of a hydroelectric power company. The goal is to train and test on real world data sets.

Advisor: Troels Bojesen

Co-supervisor: Ole Håkon Hovland, Saudefaldene

## A closer look at the initialization of neural network parameters

It is well known that the training of neural networks using stochastic gradient descent algorithms is sensitive to the initialization of the network parameters. Initialization schemes like "Glorot initialization" [1] and "He initialization" [2] attempt to balance between the vanishing and exploding gradient regimes, but do so by making rather strict assumptions on the functional form of the activation functions.

The task here is to explore - theoretically as well as practically - whether more general, flexible, and possibly more powerful neural network parameter initialization schemes can be devised. Can one design a meta-initialization algorithm that adapts the parameter initialization to a given activation function/ network architecture/ optimization algorithm? From which distribution should the initial parameters be drawn if we want to not only achieve good training stability, but also efficiency, speed, and quality of outcome?

[1] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

[2] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.

Advisor: Troels Bojesen

## Reinforcement learning and turn-based multiplayer games

Deepmind's algorithms (AlphaGo, AlphaZero, etc.) have famously demonstrated superhuman abilities at playing classical 2 player board games like Go, Chess, and Shogi. But what about similar games with more than 2 players?

Task: choose or invent a conceptually simple, but non-trivial game that can be played by more than 2 players. The game could for instance be Chinese checkers, Gomoku ("Five in a row") with more than two colors, or something else of your own liking. Develop a reinforcement learning algorithm that can beat you and your friends in the game. Is the algorithm able to discover new tactics and novel patterns? How is the game dynamics affected by having more than 2 players? What happens if the players are able to communicate with each other as the game is played?

Advisor: Troels Bojesen

## Make an AI drawing assistant

Auto-completion and proposing words and even sentences are now commonplace in the world of text. What about the equivalent in visual arts like sketching and drawing? When sketching, have you ever felt that you have to redo a line several times to get it "just right"? Can we teach an AI drawing assistant to make this line for us?

The idea here is to design and implement a simple drawing program with an AI drawing assistant that proposes such "just right" lines based on the (digital) stroke of a mouse or stylus, as well as what is already in the drawing. The learning signal should come from the artist, the user of the program, who gives positive or negative feedback to the assistant based on how well the proposed line fits what he or she had in mind. Over time, the assistant should thus become better and better at helping the artist out with the technical details of the artwork, leaving the artist with more time and mental energy for his/her grander artistic visions. Or doodles.

Advisor: Troels Bojesen

## Machine learning approaches toward personalized treatment of leukemia

With new data on multiple omics level reveal more information on leukemia and the effect of drugs, there are new opportunities to tailor treatment to each individual patient. In an on-going European project we study leukemia and use data both from individual patients and from cell line and mouse model systems to improve the understanding of genomic clonality, signaling pathway status aiming to generate data enabling machine learning approaches to predict prognosis and treatment response. The focus of the project will be on setting up an appropriate software system enabling evaluation of alternative feature selection methods and classification approaches. There is an opportunity to work tightly with bioinformatics, systems biology and cancer researchers in the above mentioned European project including partners in Germany and the Netherlands and also with the Centre of Excellence CCBIO (Center for Cancer Biomarkers) in Bergen.

Advisor: Inge Jonassen

## Applications of causal inference methods to omics data

Many hard problems in machine learning are directly linked to causality [1]. The graphical causal inference framework developed by Judea Pearl can be traced back to pioneering work by Sewall Wright on path analysis in genetics and has inspired research in artificial intelligence (AI) [1].

The Michoel group has developed the open-source tool Findr [2] which provides efficient implementations of mediation and instrumental variable methods for applications to large sets of omics data (genomics, transcriptomics, etc.). Findr works well on a recent data set for yeast [3].

We encourage students to explore promising connections between the fiels of causal inference and machine learning. Feel free to contact us to discuss projects related to causal inference. Possible topics include: a) improving methods based on structural causal models, b) evaluating causal inference methods on data for model organisms, c) comparing methods based on causal models and neural network approaches.

References:

1. Schölkopf B, Causality for Machine Learning, arXiv (2019): https://arxiv.org/abs/1911.10500

2. Wang L and Michoel T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLoS Computational Biology 13:e1005703 (2017). https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005703

3. Ludl A and and Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. arXiv:2010.07417 https://arxiv.org/abs/2010.07417

Advisor: Adriaan Ludl, Tom Michoel

## Applications of causal inference methods in neuroscience

Here we propose to employ state-of-the-art causal inference and machine learning (ML) methods to study the networks formed by neurons in the brains of living zebrafish. Imaging experiments track the activity of hundreds of neurons during behavioural tasks and in fish treated with drugs. The aim is to understand neuronal firing patterns and rewiring in the brain.

Methods such as Granger Causality and Transfer Entropy can be used to infer the flow of information and the causal connections between neurons from time series of their action potentials recorded during experiments. These methods can also be validated on simulations of neuronal networks [2].

This is an exciting opportunity to combine machine learning and neuroscience on data provided by the lab of Prof. Emre Yaksi at the Kavli Institute for Systems Neuroscience at NTNU, Trondheim.

References:

1. Forè, S et al. Functional properties of habenular neurons are determined by developmental stage and sequential neurogenesis. Science Advances (2020).

2. Ludl A, Soriano J. Impact of Physical Obstacles on the Structural and Effective Connectivity of in silico Neuronal Circuits. Front. Comput. Neurosci., 31 August 2020 https://doi.org/10.3389/fncom.2020.00077

Advisor: Adriaan Ludl, Tom Michoel

## Graph neural networks

In machine learning, the question of how to incorporate the structure of a graph into predictive tasks has received much attention. Recently, with the advent of deep learning, the idea of representation learning on graphs has been introduced. In this concept, the main approach is to map nodes, subgraphs, or the entire graph into points in a low-dimensional vector space [1]. In this embedding, the main goal is to preserve the local structure of the graph around each node, without having to specify in advance what “local'' means. Graph Neural Networks (GNNs) address the network embedding problem through a deep auto encoder framework, and have been show to perform better at subsequent machine learning tasks than traditional embedding methods. Furthermore, GNNs are able to tackle the problem of graph analytic tasks such as graph classification, regression, etc. in an end-to-end manner rather than a separate step on network embedding which improves their ability in this area massively compared to traditional approaches [2].

In this project, we look at different type of machine learning tasks on network datasets such as Recommender Systems, Point Clouds, Biological Networks and try to solve the typical machine learning tasks such as generation, classification, regression, etc. through use of novel GNN structures.

[1] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. CoRR, abs/1709.05584, 2017.

[2] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: Areview of methods and applications. CoRR, abs/1812.08434, 2018.

[3] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensivesurvey on graph neural networks. CoRR, abs/1901.00596, 2019.

Advisor: Ramin Hasibi, Tom Michoel

## Towards precision medicine for cancer patient stratification

On average, a drug or a treatment is effective in only about half of patients who take it. This means patients need to try several until they find one that is effective at the cost of side effects associated with every treatment. The ultimate goal of precision medicine is to provide a treatment best suited for every individual. Sequencing technologies have now made genomics data available in abundance to be used towards this goal.

In this project we will specifically focus on cancer. Most cancer patients get a particular treatment based on the cancer type and the stage, though different individuals will react differently to a treatment. It is now well established that genetic mutations cause cancer growth and spreading and importantly, these mutations are different in individual patients. The aim of this project is use genomic data allow to better stratification of cancer patients, to predict the treatment most likely to work. Specifically, the project will use machine learning approach to integrate genomic data and build a classifier for stratification of cancer patients.

Advisor: Anagha Joshi

## Unraveling gene regulation from single cell data

Multi-cellularity is achieved by precise control of gene expression during development and differentiation and aberrations of this process leads to disease. A key regulatory process in gene regulation is at the transcriptional level where epigenetic and transcriptional regulators control the spatial and temporal expression of the target genes in response to environmental, developmental, and physiological cues obtained from a signalling cascade. The rapid advances in sequencing technology has now made it feasible to study this process by understanding the genomewide patterns of diverse epigenetic and transcription factors as well as at a single cell level.

Single cell RNA sequencing is highly important, particularly in cancer as it allows exploration of heterogenous tumor sample, obstructing therapeutic targeting which leads to poor survival. Despite huge clinical relevance and potential, analysis of single cell RNA-seq data is challenging. In this project, we will develop strategies to infer gene regulatory networks using network inference approaches (both supervised and un-supervised). It will be primarily tested on the single cell datasets in the context of cancer.

Advisor: Anagha Joshi

## Developing a Stress Granule Classifier

To carry out the multitude of functions 'expected' from a human cell, the cell employs a strategy of division of labour, whereby sub-cellular organelles carry out distinct functions. Thus we traditionally understand organelles as distinct units defined both functionally and physically with a distinct shape and size range. More recently a new class of organelles have been discovered that are assembled and dissolved on demand and are composed of liquid droplets or 'granules'. Granules show many properties characteristic of liquids, such as flow and wetting, but they can also assume many shapes and indeed also fluctuate in shape. One such liquid organelle is a stress granule (SG).

Stress granules are pro-survival organelles that assemble in response to cellular stress and important in cancer and neurodegenerative diseases like Alzheimer's. They are liquid or gel-like and can assume varying sizes and shapes depending on their cellular composition.

In a given experiment we are able to image the entire cell over a time series of 1000 frames; from which we extract a rough estimation of the size and shape of each granule. Our current method is susceptible to noise and a granule may be falsely rejected if the boundary is drawn poorly in a small majority of frames. Ideally, we would also like to identify potentially interesting features, such as voids, in the accepted granules.

We are interested in applying a machine learning approach to develop a descriptor for a 'classic' granule and furthermore classify them into different functional groups based on disease status of the cell. This method would be applied across thousands of granules imaged from control and disease cells. We are a multi-disciplinary group consisting of biologists, computational scientists and physicists.

Advisor: Sushma Grellscheid, Carl Jones

## Machine Learning based Hyperheuristic algorithm

Develop a Machine Learning based Hyper-heuristic algorithm to solve a pickup and delivery problem. A hyper-heuristic is a heuristics that choose heuristics automatically. Hyper-heuristic seeks to automate the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems [Handbook of Metaheuristics]. There might be multiple heuristics for solving a problem. Heuristics have their own strength and weakness. In this project, we want to use machine-learning techniques to learn the strength and weakness of each heuristic while we are using them in an iterative search for finding high quality solutions and then use them intelligently for the rest of the search. Once a new information is gathered during the search the hyper-heuristic algorithm automatically adjusts the heuristics.

Advisor: Ahmad Hemmati

## Machine learning for solving satisfiability problems and applications in cryptanalysis

Advisor: Igor Semaev

## Own topic combining logic and learning

If you want to suggest your own topic combining logic and learning, please contact Ana Ozaki

## Own topic

If you want to suggest your own topic, please contact Pekka Parviainen