Benefits of open science practices in proteomics and challenges arising from clinical datasets

Juan Antonio Vizcaino
EMBL-European Bioinformatics Institute, UK

High-throughput proteomics approaches are increasingly important in the life sciences and have developed enormously in recent years. In addition to great advances in mass spectrometry (MS) and other parts of the analytical workflow, computational approaches have been major drivers behind this progress in recent years.

In parallel to these technical developments, open data policies have generalized in the field, offering multiple new opportunities for researchers. The PRIDE database (https://www.ebi.ac.uk/pride) at the European Bioinformatics Institute is the world’s largest data repository of mass spectrometry (MS)-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium, whose mission is to standardize open data practices in the field. This huge availability of proteomics data in the public domain has triggered many data re-use activities, for many different purposes. I will highlight some of these ongoing efforts in-house and in the community as a whole, including some examples of machine learning approaches applied to public proteomics datasets.

Proteomics is also increasingly used for clinical studies, alone or in conjunction with other omics, e.g., transcriptomics, in personalized medicine approaches. In this context, potential ethical issues that can arise from proteomics data, which have only recently started to be discussed. I will describe the current state of the art, and the expected changes for the future in the context of data management practices in the field.

Chairperson: Harald Barsnes, Department of Biomedicine