Home
Machine Learning
thesis topics

Available Master's thesis topics in machine learning

Main content

Learning and inference with large Bayesian networks

Most learning and inference tasks with Bayesian networks are NP-hard. Therefore, one often resorts to using different heuristics that do not give any quality guarantees.

Task: Evaluate quality of large-scale learning or inference algorithms empirically.

Advisor: Pekka Parviainen

Sum-product networks

Traditionally, probabilistic graphical models use a graph structure to represent dependencies and independencies between random variables. Sum-product networks are a relatively new type of a graphical model where the graphical structure models computations and not the relationships between variables. The benefit of this representation is that inference (computing conditional probabilities) can be done in linear time with respect to the size of the network.

Potential thesis topics in this area: a) Compare inference speed with sum-product networks and Bayesian networks. Characterize situations when one model is better than the other. b) Learning the sum-product networks is done using heuristic algorithms. What is the effect of approximation in practice?

Advisor: Pekka Parviainen

Bayesian Bayesian networks

The naming of Bayesian networks is somewhat misleading because there is nothing Bayesian in them per se; A Bayesian network is just a representation of a joint probability distribution. One can, of course, use a Bayesian network while doing Bayesian inference. One can also learn Bayesian networks in a Bayesian way. That is, instead of finding an optimal network one computes the posterior distribution over networks.

Task: Develop algorithms for Bayesian learning of Bayesian networks (e.g., MCMC, variational inference, EM)

Advisor: Pekka Parviainen

Large-scale (probabilistic) matrix factorization

The idea behind matrix factorization is to represent a large data matrix as a product of two or more smaller matrices.They are often used in, for example, dimensionality reduction and recommendation systems. Probabilistic matrix factorization methods can be used to quantify uncertainty in recommendations. However, large-scale (probabilistic) matrix factorization is computationally challenging.

Potential thesis topics in this area: a) Develop scalable methods for large-scale matrix factorization (non-probabilistic or probabilistic), b) Develop probabilistic methods for implicit feedback (e.g., recommmendation engine when there are no rankings but only knowledge whether a customer has bought an item)

Advisor: Pekka Parviainen

Bayesian deep learning

Standard deep neural networks do not quantify uncertainty in predictions. On the other hand, Bayesian methods provide a principled way to handle uncertainty. Combining these approaches leads to Bayesian neural networks. The challenge is that Bayesian neural networks can be cumbersome to use and difficult to learn.

The task is to analyze Bayesian neural networks and different inference algorithms in some simple setting.

Advisor: Pekka Parviainen

Deep learning for combinatorial problems

Deep learning is usually applied in regression or classification problems. However, there has been some recent work on using deep learning to develop heuristics for combinatorial optimization problems; see, e.g., [1] and [2].

Task: Choose a combinatorial problem (or several related problems) and develop deep learning methods to solve them.

References: [1] Vinyals, Fortunato and Jaitly: Pointer networks. NIPS 2015. [2] Dai, Khalil, Zhang, Dilkina and Song: Learning Combinatorial Optimization Algorithms over Graphs. NIPS 2017.

Advisors: Pekka Parviainen, Ahmad Hemmati

Estimating the number of modes of an unknown function

Mode seeking considers estimating the number of local maxima of a function f. Sometimes one can find modes by, e.g., looking for points where the derivative of the function is zero. However, often the function is unknown and we have only access to some (possibly noisy) values of the function. 

In topological data analysis,  we can analyze topological structures using persistent homologies. For 1-dimensional signals, this can translate into looking at the birth/death persistence diagram, i.e. the birth and death of connected topological components as we expand the space around each point where we have observed our function. These observations turn out to be closely related to the modes (local maxima) of the function. A recent paper [1] proposed an efficient method for mode seeking.

In this project, the task is to extend the ideas from [1] to get a probabilistic estimate on the number of modes. To this end, one has to use probabilistic methods such as Gaussian processes.

[1] U. Bauer, A. Munk, H. Sieling, and M. Wardetzky. Persistence barcodes versus Kolmogorov signatures: Detecting modes of one-dimensional signals. Foundations of computational mathematics17:1 - 33, 2017.

Advisors: Pekka ParviainenNello Blaser

Automatic hyperparameter selection for isomap

Isomap is a non-linear dimensionality reduction method with two free hyperparameters (number of nearest neighbors and neighborhood radius). Different hyperparameters result in dramatically different embeddings. Previous methods for selecting hyperparameters focused on choosing one optimal hyperparameter. In this project, you will explore the use of persistent homology to find parameter ranges that result in stable embeddings. The project has theoretic and computational aspects.

Advisor: Nello Blaser

Directed cycle finding

Finding cycles in directed graphs is one of the subroutines in many algorithms for learning the structure of Bayesian networks. In this project, you will use methods from topological data analysis on directed graphs to find cycles more efficiently. Standard tools for finding cycles exist in the case of undirected graphs, and some recent work has focused on finding persistent homology of directed graphs. In this project, you will combine the two approaches to implement a method that finds cycles in directed graphs. You will then compare these methods with standard network methods in the context of Bayesian networks. This is an implementation project.

Advisor: Nello Blaser

Notions of stability in machine learning

In topological data analysis, the term stability usually means that the output of an algorithm changes little, when the input is perturbed. In computational learning theory on the other hand, there are numerous definitions of stability, such as hypothesis stability, error stability or uniform stability. In this project, you will relate different definitions of stability to one-another, learn about stability of particular machine learning algorithms and develop the stability theory for persistent homology from a computational learning theory standpoint. This project is mostly theoretical.

Advisor: Nello Blaser

Validate persistent homology

Persistent homology is a generalization of hierarchical clustering to find more structure than just the clusters. Traditionally, hierarchical clustering has been evaluated using resampling methods and assessing stability properties. In this project you will generalize these resampling methods to develop novel stability properties that can be used to assess persistent homology. This project has theoretic and computational aspects.

Advisor: Nello Blaser

Persistent homology benchmarks

Persistent homology is becoming a standard method for analyzing data. In this project, you will to generate benchmark data sets for testing different aspects of the persistence pipeline. You will generate benchmarks for different objectives, such as data with known persistence diagram, where for example bottleneck distance can be minimized and data with classification and regression targets. Data sets will be sampled from a manifold with or without noise or from a general probability distribution. This project is mostly computational.

Advisor: Nello Blaser

Divisive covers

Divisive covers are a divisive technique for generating filtered simplicial complexes. They original used a naive way of dividing data into a cover. In this project, you will explore different methods of dividing space, based on principle component analysis, support vector machines and k-means clustering. In addition, you will explore methods of using divisive covers for classification. This project will be mostly computational.

Advisor: Nello Blaser

Binarized Neural Networks

Binarized neural networks (BNNs) have recently attracted a lot of attention in the AI research community as a memory-efficient alternative to classical deep neural network models. In 2018, Narodytska et al. proposed an exact translation of BNNs into propositional logic. Using this translation, various properties such as robustness against adversarial attacks can be proved. The main tasks in this project are to study BNNs and the translation into propositional logic, implement an optimised version of the translation, and perform experiments verifying its correctness.

References:

Binarized neural networks by Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv,Yoshua Bengio (NeurIPS-16)

Verifying Properties of Binarized Deep Neural Networks by Nina Narodytska, Shiva PrasadKasiviswanathan, Leonid Ryzhyk, Mooly Sagiv, Toby Walsh (AAAI-18)

Advisor: Ana Ozaki

Quantum Neural Networks

Quantum computers can solve certain types of problems exponentially faster than classical computers - so-called quantum supremacy. However, it is still mostly unclear how far quantum supremacy goes, i.e. for what types of problems quantum computing outperforms classical computing. As quantum computers become larger (more qubits) and more reliable (lower error rates), we approach the point where they may become relevant for machine learning applications.One of the proposed methods in this field are so-called quantum neural networks (QNN). Where classical neural networks (CNN) use real-valued weights, activation functions, input and output data, in a QNN all of these are represented by complex quantum states and quantum operations. This allows for a much denser encoding of information, so that a small QNN may be functionally equivalent to a much larger CNN. For larger QNN, the equivalent CNN would have to be so enormously large that it is completely infeasible.This leads to the central objective of this project:Under which conditions can a QNN achieve quantum supremacy? How do QNN and CNN compare in terms of learning speed, accuracy, etc. for different classes of problems, and how does their performance scale with size?In the foreseeable future, quantum computers will be relatively noisy; that means they will have high error rates. This poses an additional problem:How does noise affect the performance of a QNN? Are there limits to how much noise a QNN can tolerate? How does the effect of noise scale with the size of the QNN?You can approach this project in two ways:

As a theoretical thesis based on mathematical models and learning theory

As a practical thesis based on coding and benchmarking prototypes

… or any combination of (a) and (b).If you are interested, please contact Philip Turk or Ana Ozaki

References

[1] Beer, K., Bondarenko, D., Farrelly, T. et al. Training deep quantum neural networks. Nat Commun 11, 808 (2020). https://doi.org/10.1038/s41467-020-14454-2https://www.nature.com/articles/s41467-020-14454-2

[2] Schuld, M. and Petruccione, F. Supervised Learning with Quantum Computers, Springer, 2018.

Neural Network Verification

Neural networks have been applied in many areas. However, any method based on generalizations may fail and this is by design. The question is how to deal with such failures. To limit them, one can define rules that a neural network should follow and devise strategies to verify whether the rules are obeyed. The main tasks of this project are to study an algorithm for learning rules formulated in propositional Horn, implement the algorithm, and apply it to verify neural networks.       

References:

Queries and Concept Learning by Angluin (Machine Learning 1988)

Exact Learning: On the Boundary between Horn and CNF by Hermo and Ozaki (ACM TOCT 2020).

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples by Weiss, Goldberg, Yahav (ICML 2018)

Advisor: Ana Ozaki

Knowledge Graph Embeddings

Knowledge graphs can be understood as labelled graphs whose nodes and edges are enriched with meta-knowledge, such as temporal validity, geographic coordinates, and provenance. Recent research in machine learning attempts to complete (or predict) facts in a knowledge graph by embedding entities and relations in low-dimensional vector spaces. The main tasks of this project are to study knowledge graph embeddings, study ways of integrating temporal validity in the geometrical model of a knowledge graph, implement and perform tests with an embedding that represents the temporal evolution of entities using their vector representations.

References:

Translating Embeddings for Modeling Multi-relational Data by Bordes, Usunier, Garcia-Durán (NeurIPS 2013)

Temporally Attributed Description Logics by Ozaki, Krötzsch, Rudolph (Book chapter: Description Logic, Theory Combination, and All That 2019)

Attributed Description Logics: Reasoning on Knowledge Graphs by Krötzsch, Marx, Ozaki, Thost (ISWC 2017)

Advisor: Ana Ozaki

Knowledge Graph Repair

While Knowledge Graphs are becoming increasingly popular, one persistent issue concerns the quality of data. Sometimes not only the information described is incomplete, but it is also incorrect. One can rely on ontological approaches or machine learning techniques using knowledge graph embeddings to fix incorrect information in such graphs. This project's primary research goal is to investigate the combination of methods in the mentioned approaches. Embeddings that can relate to the taxonomical rules in the Knowledge Graphs are particularly promising.

References:

  • Improved knowledge graph embedding using background taxonomic information by Fatemi, Ravanbakhsh, Poole. (AAAI 2019).
  • Debugging incoherent terminologies by Schlobach, Huang, Cornet, van Harmelen. (JAIR v39 - 2007).
  • KGClean: An Embedding Powered Knowledge Graph Cleaning Framework by Ge, Gao, Weng, Zhang, Miao, Zheng. (arXiv 2020).

Advisor: Ricardo Guimarães

Decidability and Complexity of Learning 

Gödel showed in 1931 that, essentially, there is no consistent and complete set of axioms that is capable of modelling traditional arithmetic operations. Recently, Ben-David et al. defined a general learning model and showed that learnability in this model may not be provable using the standard axioms of mathematics. The main tasks of this project are to study Gödel's incompleteness theorems, the connection between these theorems and the theory of machine learning, and to investigate learnability and complexity classes in the PAC and the exact learning models.

References:

Learnability can be undecidable by Ben-David, Hrubeš, Moran, Shpilka, Yehudayoff (Nature 2019)

On the Complexity of Learning Description Logic Ontologies by Ozaki (RW 2020)

Advisor: Ana Ozaki

Machine Ethics

Autonomous systems, such as self-driving cars, need to behave according to the environment in which they are embedded. However, ethical and moral behaviour is not universal and it is often the case that the underlying behaviour norms change among countries or groups of countries and a compromise among such differences needs to be considered.

The moral machines experiment (https://www.moralmachine.net/) exposed people to a series of moral dilemmas and asked people what should an autonomous vehicle do in each of the given situations. Researchers then tried to find similarities between the answers from the same region.

The main tasks of this project are to study the moral machine experiment, study and implement an algorithm for building compromises among different regions (or even people). We have developed a compromise building algorithm that works on behavioural norms represented as Horn clauses. Assume that each choice example from the moral machines experiment is behavioural norm represented as a Horn clause. The compromise algorithm is applied to these choices obtained from different people during the moral machines experiment. One of the goals of this project would be to determine how to (efficiently) compute compromises for groups of countries (e.g., the Nordic Countries and Scandinavia).

 

References:

The Moral Machine experiment by Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan (Nature 2018)

 

Advisors: Ana Ozaki, Marija Slavkovik

Reinforcement learning for sparsification

Reinforcement learning has recently become a way to heuristically solve optimization problems. In this project, you will set up the problem of finding a sparse approximation for persistent homology using the reinforcement framework. You will train a neural network to find approximations of simplicial complexes that can be smaller and more precise than traditional approximation techniques. The setup of the reinforcement problem requires a deep theoretic understanding, and the problem also has a computational aspect.

Advisor: Nello Blaser

Topology of encodings

State of the art for natural language processing and facial recognition use vector embedding algorithms such as word2vec or Siamese networks. Classically, such vector embeddings are analyzed using cluster analysis or supervised methods. In this project, you will use network analysis and topological methods to analyze vector embeddings in order to find a richer description of vector embeddings. This project will be applied.

Advisor: Nello Blaser

Multimodality in Bayesian neural network ensembles

One method to assess uncertainty in neural network predictions is to use dropout or noise generators at prediction time and run every prediction many times. This leads to a distribution of predictions. Informatively summarizing such probability distributions is a non-trivial task and the commonly used means and standard deviations result in the loss of crucial information, especially in the case of multimodal distributions with distinct likely outcomes. In this project, you will analyze such multimodal distributions with mixture models and develop ways to exploit such multimodality to improve training. This project can have theoretical, computational and applied aspects.

Advisor: Nello Blaser

Multitask variational autoencoders

Autoencoders are a type of artificial neural network that to learn a data representation, typically for dimensionality reduction. Variational autoencoders are generative models that combine the autoencoder architectures with probabilistic graphical modeling. They may be used to restore damaged data by conditioning the decoder on the remaining data. In this project you will explore if joint training of a traditional variational autoencoder and restoring variational autoencoders can make the embedding more stable. The project will be mostly computational, but may have some theoretic aspects.

Advisor: Nello Blaser

Topology of binary classifiers

A linear classifier separates the underlying space into two connected components. A nearest neighbor classifier on the other hand may divide the space into many connected components. Overfitting can result in dividing space into too many components. In this project, you will study how many connected components a different classifiers result in. We will then devise a regularization technique that penalizes many connected components. This project will have theoretical and computational aspects.

Advisor: Nello Blaser

Detecting small clusters

Standard clustering methods are good at detecting clusters of a certain size and density. Detecting small clusters is difficult, because they lie in low density regions. In this project, you will use methods from anomaly detection coupled with clustering techniques to overcome this challenges. In addition, you will test the new techniques on real-world mass cytometry data. This project will be computational and applied.

Advisor: Nello Blaser

Solar panel improvement

In this project, you will use the state of the art machine learning methods to improve solar panel designs. Greve et al. have recently shown that adding nano-particles to solar panels can increase their efficiency. However, there are many possible configurations how the nano-particles can be added to the solar panels and time-consuming physics modeling is necessary to optimize the design. You will initially explore a 2-dimensional parameter space of nano-particle configurations and design a machine learning model that predicts the light spectrum the solar panel can capture. You will then use this machine learning model to optimize the parameters. Extensions of the project may involve optimizing a more complex higher-dimensional parameter space. In this project, we will work closely with physicists who will check your results with the physics models.

Advisors: Nello Blaser, Martin Møller Greve

Automating the collection of evidence for medical interventions inlow-income countries

In this project, you will use the state of the art natural languageprocessing tools, such as latent semantic analysis, TextRank, andrecurrent neural networks to develop a reliable system that scans themedical literature, extracts and ranks the most relevant papers forspecific medical interventions. Bergen Center for Ethics and PrioritySetting develops software that helps low-income countries (startingwith Ethiopia, Malawi, and Zanzibar) doing priority setting whenallocating their health budgets. This software requires input on costand effects of all health interventions underconsideration. Currently, we focus on 218 interventions that areconsidered the most important, but in time the plan is to scale up tothousands of interventions. In this project you will develop an AIthat scans the medical literature and ranks papers according torelevance for each intervention. Such a tool could have a great impacton the health of people living in countries that cannot afford to havea large number of experts conduct clinical trials or conductliterature reviews.

Advisors: Nello Blaser, Øystein Haaland

Cytometry analysis

We are looking for 2-3 students to join an interdisciplinary project where you will work together with medical doctors to analyse mass cytometry data. This is data on single cells and we are considering both suspension and image data. Potential projects range from applied data analysis to the development of new specialized methods to solve problems that arise in mass cytometry.

Advisors: Nello Blaser, Sonia Gavasso

 

Simulated underwater environment and deep learning

Using data from the Mareano surveys or the LoVe underwater observatory, create a simulator for underwater benthic (i.e. sea bed) scenes by placing objects randomly (but credibly) on a background. Using the simulated data, train deep learning neural networks to:

a) recognize presence of specific objects b) locate specific objects c) segment specific objects

Test the systems on real data and evaluate the results.

Advisor: Ketil Malde

Evaluating the effects and interaction of hyperparameters in convolutional neural networks

Neural networks have many hyperparameters, including choice of activation functions, regularization and normalization, gradient descent method, early stopping, cost function, and so on.  While best practices exist, the interactions between the different choices can be hard to predict. To study this, train networks on suitable benchmark data, using randomized choices for hyperparameters, and observe parameters like rate of convergence, over- and underfitting, magnitude of gradient, and final accuracy.

Advisor: Ketil Malde

Online learning in real-time systems

Build a model for the drilling process by using the Virtual simulator OpenLab (https://openlab.app/) for real-time data generation and online learning techniques. The student will also do a short survey of existing online learning techniques and learn how to cope with errors and delays in the data.

Advisor: Rodica Mihai

Building a finite state automaton for the drilling process by using queries and counterexamples

Datasets will be generated by using the Virtual simulator OpenLab (https://openlab.app/). The student will study the datasets and decide upon a good setting to extract a finite state automaton for the drilling process. The student will also do a short survey of existing techniques for extracting finite state automata from process data. We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin's L*algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.arxiv.org

Advisor: Rodica Mihai

Machine learning approaches toward personalized treatment of leukemia

With new data on multiple omics level reveal more information on leukemia and the effect of drugs, there are new opportunities to tailor treatment to each individual patient. In an on-going European project we study leukemia and use data both from individual patients and from cell line and mouse model systems to improve the understanding of genomic clonality, signaling pathway status aiming to generate data enabling machine learning approaches to predict prognosis and treatment response. The focus of the project will be on setting up an appropriate software system enabling evaluation of alternative feature selection methods and classification approaches. There is an opportunity to work tightly with bioinformatics, systems biology and cancer researchers in the above mentioned European project including partners in Germany and the Netherlands and also with the Centre of Excellence CCBIO (Center for Cancer Biomarkers) in Bergen.

Advisor: Inge Jonassen

Applications of causal inference methods to omics data

Many hard problems in machine learning are directly linked to causality [1]. The graphical causal inference framework developed by Judea Pearl can be traced back to pioneering work by Sewall Wright on path analysis in genetics and has inspired research in artificial intelligence (AI) [1].

The Michoel group has developed the open-source tool Findr [2] which provides efficient implementations of mediation and instrumental variable methods for applications to large sets of omics data (genomics, transcriptomics, etc.). Findr works well on a recent data set for yeast [3].

We encourage students to explore promising connections between the fiels of causal inference and machine learning. Feel free to contact us to discuss projects related to causal inference. Possible topics include: a) improving methods based on structural causal models, b) evaluating causal inference methods on data for model organisms, c) comparing methods based on causal models and neural network approaches.

References:

1. Schölkopf B, Causality for Machine Learning, arXiv (2019): https://arxiv.org/abs/1911.10500

2. Wang L and Michoel T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLoS Computational Biology 13:e1005703 (2017). https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005703

3. Ludl A and and Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. arXiv:2010.07417 https://arxiv.org/abs/2010.07417

Advisors: Adriaan LudlTom Michoel

Applications of causal inference methods in neuroscience

Here we propose to employ state-of-the-art causal inference and machine learning (ML) methods to study the networks formed by neurons in the brains of living zebrafish. Imaging experiments track the activity of hundreds of neurons during behavioural tasks and in fish treated with drugs. The aim is to understand neuronal firing patterns and rewiring in the brain.

Methods such as Granger Causality and Transfer Entropy can be used to infer the flow of information and the causal connections between neurons from time series of their action potentials recorded during experiments. These methods can also be validated on simulations of neuronal networks [2].

This is an exciting opportunity to combine machine learning and neuroscience on data provided by the lab of Prof. Emre Yaksi at the Kavli Institute for Systems Neuroscience at NTNU, Trondheim.

References:

1. Forè, S et al. Functional properties of habenular neurons are determined by developmental stage and sequential neurogenesis. Science Advances (2020).

2. Ludl A, Soriano J. Impact of Physical Obstacles on the Structural and Effective Connectivity of in silico Neuronal Circuits. Front. Comput. Neurosci., 31 August 2020 https://doi.org/10.3389/fncom.2020.00077

Advisors: Adriaan LudlTom Michoel

Graph neural networks

In machine learning, the question of how to incorporate the structure of a graph into predictive tasks has received much attention. Recently, with the advent of deep learning, the idea of representation learning on graphs has been introduced. In this concept, the main approach is to map nodes, subgraphs, or the entire graph into points in a low-dimensional vector space [1]. In this embedding, the main goal is to preserve the local structure of the graph around each node, without having to specify in advance what “local'' means. Graph Neural Networks (GNNs) address the network embedding problem through a deep auto encoder framework, and have been show to perform better at subsequent machine learning tasks than traditional embedding methods. Furthermore, GNNs are able to tackle the problem of graph analytic tasks such as graph classification, regression, etc. in an end-to-end manner rather than a separate step on network embedding which improves their ability in this area massively compared to traditional approaches [2].

In this project, we look at different type of machine learning tasks on network datasets such as Recommender Systems, Point Clouds, Biological Networks and try to solve the typical machine learning tasks such as generation, classification, regression, etc. through use of novel GNN structures.

[1] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. CoRR, abs/1709.05584, 2017.

[2] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: Areview of methods and applications. CoRR, abs/1812.08434, 2018.

[3] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensivesurvey on graph neural networks. CoRR, abs/1901.00596, 2019.

Advisors: Ramin HasibiTom Michoel

Towards precision medicine for cancer patient stratification

On average, a drug or a treatment is effective in only about half of patients who take it. This means patients need to try several until they find one that is effective at the cost of side effects associated with every treatment. The ultimate goal of precision medicine is to provide a treatment best suited for every individual. Sequencing technologies have now made genomics data available in abundance to be used towards this goal.

In this project we will specifically focus on cancer. Most cancer patients get a particular treatment based on the cancer type and the stage, though different individuals will react differently to a treatment. It is now well established that genetic mutations cause cancer growth and spreading and importantly, these mutations are different in individual patients. The aim of this project is use genomic data allow to better stratification of cancer patients, to predict the treatment most likely to work. Specifically, the project will use machine learning approach to integrate genomic data and build a classifier for stratification of cancer patients.

Advisor: Anagha Joshi

Unraveling gene regulation from single cell data

Multi-cellularity is achieved by precise control of gene expression during development and differentiation and aberrations of this process leads to disease. A key regulatory process in gene regulation is at the transcriptional level where epigenetic and transcriptional regulators control the spatial and temporal expression of the target genes in response to environmental, developmental, and physiological cues obtained from a signalling cascade. The rapid advances in sequencing technology has now made it feasible to study this process by understanding the genomewide patterns of diverse epigenetic and transcription factors as well as at a single cell level.

Single cell RNA sequencing is highly important, particularly in cancer as it allows exploration of heterogenous tumor sample, obstructing therapeutic targeting which leads to poor survival. Despite huge clinical relevance and potential, analysis of single cell RNA-seq data is challenging. In this project, we will develop strategies to infer gene regulatory networks using network inference approaches (both supervised and un-supervised). It will be primarily tested on the single cell datasets in the context of cancer.

Advisor: Anagha Joshi

Developing a Stress Granule Classifier

To carry out the multitude of functions 'expected' from a human cell, the cell employs a strategy of division of labour, whereby sub-cellular organelles carry out distinct functions. Thus we traditionally understand organelles as distinct units defined both functionally and physically with a distinct shape and size range. More recently a new class of organelles have been discovered that are assembled and dissolved on demand and are composed of liquid droplets or 'granules'. Granules show many properties characteristic of liquids, such as flow and wetting, but they can also assume many shapes and indeed also fluctuate in shape. One such liquid organelle is a stress granule (SG). 

Stress granules are pro-survival organelles that assemble in response to cellular stress and important in cancer and neurodegenerative diseases like Alzheimer's. They are liquid or gel-like and can assume varying sizes and shapes depending on their cellular composition. 

In a given experiment we are able to image the entire cell over a time series of 1000 frames; from which we extract a rough estimation of the size and shape of each granule. Our current method is susceptible to noise and a granule may be falsely rejected if the boundary is drawn poorly in a small majority of frames. Ideally, we would also like to identify potentially interesting features, such as voids, in the accepted granules.

We are interested in applying a machine learning approach to develop a descriptor for a 'classic' granule and furthermore classify them into different functional groups based on disease status of the cell. This method would be applied across thousands of granules imaged from control and disease cells. We are a multi-disciplinary group consisting of biologists, computational scientists and physicists. 

Advisors: Sushma Grellscheid, Carl Jones

Machine Learning based Hyperheuristic algorithm

Develop a Machine Learning based Hyper-heuristic algorithm to solve a pickup and delivery problem. A hyper-heuristic is a heuristics that choose heuristics automatically. Hyper-heuristic seeks to automate the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems [Handbook of Metaheuristics]. There might be multiple heuristics for solving a problem. Heuristics have their own strength and weakness. In this project, we want to use machine-learning techniques to learn the strength and weakness of each heuristic while we are using them in an iterative search for finding high quality solutions and then use them intelligently for the rest of the search. Once a new information is gathered during the search the hyper-heuristic algorithm automatically adjusts the heuristics.

Advisor: Ahmad Hemmati

Machine learning for solving satisfiability problems and applications in cryptanalysis

Advisor: Igor Semaev

Own topic combining logic and learning

If you want to suggest your own topic combining logic and learning, please contact Ana Ozaki  

Own topic

If you want to suggest your own topic, please contact Pekka Parviainen