Center for Data Science
CEDAS event 29-30 August 2022

CEDAS Networking Event 2022

The Center for Data Science (CEDAS) is organizing a networking event, combining scientific sessions on foundational and applied data science with a teaching-related session, discussion groups, and social activities to further facilitate research collaboration among the center’s members (and beyond).

Solstrand fjord view
Stefanie Meyer

Main content

The Center for Data Science (CEDAS) is organizing a networking event, combining scientific sessions on foundational and applied data science with a teaching-related session, discussion groups, and social activities to further facilitate research collaboration among the center’s members (and beyond).

The event will take place between 29-30 August 2022 at Solstrand Hotel & Bad in Os. It will be a two-day meeting with one overnight stay.

Main program: The event is composed of two scientific sessions and one teaching-related session. In addition, we have set up discussion groups focusing on different center-relevant subjects (see below).

Social activities: There will be a longer afternoon break on day 1, where participants are free to either join us for a short hike in the vicinity of the hotel (provided that the weather is suitable for it), enjoy the hotel’s leisure facilities, or engage in other networking activities.

Registration: Registration is now closed.

Program Day 1

08:30 Bus leaves at Høyteknologisenteret
09:30 – 09:45Welcome by center leader Helwig Hauser
09:45 – 11:15

Foundational Data Science (chair: Pekka Parviainen)

11:15 – 11:30Coffee break
11:30 – 12:30Discussion groups
12:30 – 13:30Lunch
13:30 – 15:00

Teaching in Data Science (chairs: Yushu Li & Hans Skaug)

  • Truls Pedersen ("How to program an artificial agent")
  • Inge Jonassen ("Plans for a new siv-ing program in data science at Informatics, UiB")
  • Johan Lyhagen ("Statistics education at Uppsala university: Personal reflections of where we are and where we are going")
  • Magnus Svendsen Nerheim ("Tools and approaches to enhance teaching and learning in Data Science and beyond")
15:00 - 15:10Group picture
15:10 – 15:30Coffee break & check in
15:30 – 16:30Discussion groups
16:30 – 18:30Indoor/Outdoor activity, free time
19:30 Dinner

Program Day 2

09:00 – 10:30

Data Science and Applications (chair: Tom Michoel)

  • Julia Romanowska ("Using full national history of drug usage to search for disease causes and cures")
  • Ramin Hasibi ("Geometric Machine learning and applications in Biology and Computer Vision")
  • Jonas Andersson ("Mixed Frequency Data in a (S, s) Pricing Model")
10:30 – 11:00Coffee break & check out
11:00 – 12:00Discussion groups
12:00 – 13:00Lunch
13:00 – 13:30Walk and talk
13:30 – 13:45Closing remarks and the way forward for CEDAS by Helwig Hauser
14:15Bus leaves from Solstrand



Jonas Andersson, Mixed Frequency Data in a (S, s) Pricing Model (Data Science and Applications)

In this talk I will discuss a research problem involving economic theory, statistical modelling and, since the model is non-trivial to estimate, computational issues. In empirical analyses prices are most often observed with a larger frequency than the explanatory variables. In this work, we overcome the mixed frequency issue by specifying a model where producers’ monthly prices are functions of their (latent) monthly marginal costs which are related to observed annual wage costs. The intermittency of the price changes is accounted for by including a stochastic threshold in the model (a so called (S,s)-model). The presentation is based on joint work with Øivind A. Nilsen and Hans J. Skaug.

Stein Andreas Bethuelsen, A probabilist's look on data science (Foundational Data Science)

In this talk I will elaborate on the role probability theory plays as part of the foundations of data science. Particularly I will focus on the relatively recently developed theory of mixing times and the cutoff phenomena for Markov chains. If time allows I will also touch upon some of my own research where I have addressed how this theory can be extended to a non-Markovian setting. The latter comes in natural e.g. in the setting of partially observed stochastic processes.

Laura Garrison, A Visual Data Science Primer (Foundational Data Science)

We are all well aware of the data explosion resulting from continuing and rapid technological advances in hardware and software. Making sense of these abundant and often complex data necessitates advanced techniques to extract and share actionable insights. Visual data science describes the practice of using visual representation to extract knowledge and gain insights from data, often integrating techniques from statistics, mathematics, machine learning, and other advanced analytical methods. It is a powerful approach to understanding data by enabling us to see multifaceted patterns and relationships, from multiple perspectives, that may otherwise be invisible. Visual data science does not end with data discovery—equally important is the communication of findings to other stakeholders. This talk will provide a brief primer to visual data science and discuss selected aspects through a series of basic and application research use cases.

Ricardo Guimarães, Learning and Reasoning with Knowledge Graphs and Ontologies (Foundational Data Science)

Knowledge Graphs (KGs) are becoming increasingly popular forms of representing data thanks to their flexibility in expressing incomplete data. Meanwhile, ontologies, documents which formally describe domain knowledge, became the de-facto way to share precise definitions in fields such as Biology and Medicine. In this talk, we will see how these two forms of Knowledge Representation are related and the different ways they can be used in reasoning with and learning from data. The talk will focus on selected contemporary challenges in integrating the symbolic and the subsymbolic branches of Artificial Intelligence to overcome their respective limitations.

Ramin Hasibi, Geometric Machine learning and applications in Biology and Computer Vision (Data Science and Applications)

In my talk, I will discuss machine learning and specifically, deep learning methods applicable on geometric datasets. In geometric deep learning for datasets such as graphs, sets, 3D shapes or point clouds, the underlying structure of the dataset is utilized in the deep learning methods to improve the performance of the machine learning framework. Furthermore, graphs do not obey a certain structure pattern and size constrains. Therefore, the methods investigated in this work should be invariant to the size, structure and order of the elements in the dataset. This presentation is based on our recent works in the field of biology as well as our collaboration with colleagues at Aalto University of Technology for applications in computer vision and robotics.

Inge Jonassen, Plans for a new siv-ing program in data science at Informatics, UiB (Teaching in Data Science)

The Department of Informatics is currently offering a bachelor program in data science. We wish to strengthen our educational offering in this area. It is challenging to fit into a three-years study program all the elements needed for a solid and broad education in data science. We therefore plan a five-year program that we believe will be attractive for potential students and educate candidates in great demand from both the public and the private sector.

Madhumita Kundu, An approximation algorithm for learning Bayesian network (Foundational Data Science)

Bayesian networks are a class of probabilistic graphical models where a directed acyclic graph (DAG) is used to express conditional independencies and dependencies among random variables. This DAG is often learned from observational data. Unfortunately, exact structure learning is NP-hard. Motivated by the hardness of exact learning, we present an approximation algorithm for learning the structure of a Bayesian network.

Johan Lyhagen, Statistics education at Uppsala university: Personal reflections of where we are and where we are going (Teaching in Data Science)

Statistics is a very old subject with strong  consensus  regarding the core identity of the subject: How to make inference about some population of interest. This has many advantages but also disadvantages. One disadvantage is that statisticians tends to discard topics outside what they define as core statistics instead of expanding the subject. This has the implication that statistics teaching, at least at the lower levels, doesn’t change much. In this talk I will present the statistics education at Uppsala University, what we have done to modernize it, and my personal reflections on this process.

Magnus Svendsen Nerheim, Tools and approaches to enhance teaching and learning in Data Science and beyond (Teaching in Data Science)

There are long traditions in utilizing digital tools in teaching and learning efforts at the University of Bergen. In recent years, there has been an increase in the desire to utilize digital tools to enhance learning. In his talk, I will touch on the current methodology for how a Faculty/Department/Programme board/teacher collegium identifies the right tools (spoilers, or rather the competencies the students should have – then find a tool), as well as the process after the faculty have identified the desired learning outcomes (e.g. (digital) competencies) for the students to get a tool in place.

Truls Pedersen, How to program an artificial agent (Teaching in Data Science)

In the course AIKI110 Artificial Agents, a compulsory course in the AI bachelor degree, the students program a small robot. In this talk Truls Pedersen will give an overview of the hardware and architecture the students are working with, and what tasks they are asked to implement in their agents.

Julia Romanowska, Using full national history of drug usage to search for disease causes and cures (Data Science and Applications)

Background: Norwegian health registries present a unique opportunity to look into the history of, e.g., drug usage (NorPD, Norwegian Prescription Database), hospital admissions (NPR, Norwegian Patient Registry), or education and emigration status (SSB, Statistics Norway) of each Norwegian citizen. Epidemiologists have been using these registries for many years, but only now it is possible to fully harness this wealth of information, thanks to the development of methodology and technology that handles such amount of data. In the DRONE project (Drug Repurposing for NEurological diseases), we develop a workflow that uses all data from NorPD, NPR, and SSB to find drugs that might aid people developing neurological diseases, such as Parkinson’s disease (PD), multiple sclerosis (MS), Alzheimer’s disease (AD), or amyotrophic lateral sclerosis (ALS). The 14-year-long history of drugs of the entire Norwegian population is examined to find out if the use of certain drugs or groups of drugs increases or decreases the risk of having a disease.

Challenges: While working on the project, we are constantly facing challenges. First, the amount of data is large: ca. 600 mln prescriptions, 4.5 mln individuals, ca. 700 various drugs. We have tried various data storage solutions and data manipulation techniques. Second, the research team is diverse, using various methodology and programming languages. Moreover, the choice of methodology is not trivial.

Opportunities: Our team has expertise in biostatistics, epidemiology, bioinformatics, and machine-learning theory. Networking with CEDAS would give us much-needed technical expertise, as well as establish a platform for future collaborations.


Discussion Groups

To further facilitate the development of CEDAS into an arena that brings together data science researchers and educators across UiB and Bergen in general, we have set aside three hours for discussion around topics that we deem relevant for data science, and/or CEDAS in particular. The discussion sessions will provide the participants with time to analyze and discuss a certain topic (day 1), and then present their findings to everyone (day 2).

In the morning of day 1, the event organizers will assign each of the participants into one of four discussion groups. The selection will be informed by the participant’s preferences regarding discussion topics, but is also influenced by the discussion topic itself in relation to the participant’s field of expertise, the distribution of participants, and by an aim to facilitate cross-disciplinary interaction.

During the first session on day 1, each group should start by assigning a spokesperson and a person taking notes (could also be the same person). The groups are then free to use the time as they see fit for their discussions. The second session on day 1 can be used to continue and finish the discussions around the given topic, as well as to prepare a summary that can be presented during the third session on day 2.

Group/Topic 1 - Ethics, GDPR, RRI, FAIR data management, Open Science, trustworthy AI and more: What can be implemented within, and/or facilitated by, CEDAS?

Group/Topic 2 – Teaching data science: We ask this group to further detail our plans regarding data science courses at UiB, not at the least based on our CEDAS workshop on data science education on April 1, 2022.

Group/Topic 3 – CEDAS community: We ask this group to discuss options for further facilitating the engagement within the CEDAS community. Concrete ideas – small as well as large – regarding worthwhile initiatives to substantiate the interaction and exchange among CEDAS researchers and educators would be welcome.

Group/Topic 4 – The next substantial CEDAS event: We ask this group to discuss a concept for a new, substantial CEDAS event in 2023 (possibly our second CEDAS conference, following up on our first conference on June 1–2, 2021).