A new workflow to standardize fossil pollen datasets for ecological research
A new study published in Global Ecology and Biogeography presents a step-by-step guide to compile numerous fossil pollen datasets into a user-specific, standardized and clean compilation – ready for further analysis.
Palaeoecology (‘the study of the past’) is important for understanding the history of biodiversity and the biosphere changes over time. By using palaeoecological data such as fossil pollen (Figure 1), scientists try to understand how ecosystems and vegetation changes through time, and how humans have affected the environment throughout history.
Figure 1: From lakes to rates of vegetation change. Fossil pollen records offer our best insights into past rates of vegetation change. A lake, or other suitable environment, is cored to retrieve the layered sediments which contain pollen grains that accumulated over thousands of years. By identifying and counting the different pollen grains researchers can then reconstruct the local vegetation composition. Finally, the rate of vegetation change is estimated from the changes in pollen abundances through time.
In the last ten years, there has been a substantial increase in fossil pollen data from all over the world in open-access databases (Figure 2). Opening new avenues for research. But, there are some challenges to putting this information together in a way that scientists can use it to study biodiversity changes. Researcher at The Department of Biological Sciences, UiB, and leading first author of the paper “A guide to the processing and standardization of global palaeoecological data for large-scale syntheses using fossil pollen”, Suzette Flantua, has been working on these datasets.
Figure 3: Essential data processing components needed to create a standardized, harmonized, palaeoecological dataset compilation before macro-scale data analysis. Note that each component consists of selecting appropriate datasets and samples based on user-defined criteria guided by the research questions. Such criteria influence the outcome of the analyses obtained from the dataset compilation. Therefore, careful documentation of these criteria is pivotal for data quality control and reproducibility. Vector credits: Dataset compilation and data sources: Design by fullvector/Freepik; Flowers: Design by rawpixel.com/Freepik.
"For several years now we have been compiling thousands of fossil pollen datasets together, and we’ve realized along the way that many critical steps exist that any researcher using these data should be aware of, but there were no clear guidelines anywhere. It was also often unclear how other people processed their data", says Flantua.
To ensure a standardized use of fossil pollen data and minimize the risk of erroneous interpretations, Flantua and her colleagues have created a guide with tips and tricks for compiling fossil pollen data.
"These datasets come from many different environments around the world, they are cored and analyzed by many different researchers, and they represent highly diverse plant assemblages. Before any analysis can be done, such compilation needs to be carefully selected to guarantee good data quality. That is why we have developed a guide on how to standardize these data that are now accessible for many researchers in different fields", says Flantua.
Discipline-friendly guideline to process fossil pollen data
Ondřej Mottl, the leading developer of the software in the guide, and co-leading first author of the paper, says that the guide is designed to make data preparation easy and accessible for everyone, regardless of their coding skills.
"We understand that handling all data preparation steps can be technically challenging, which is why we've structured our workflow step-by-step and provided clear signposts at each critical juncture. Our software interacts with users throughout the process, guiding them towards the desired dataset for analysis", says Mottl.
Figure 2: Obtaining fossil pollen records from around the globe. Global studies such as this one require the careful gathering of many records collected by many individual research teams, compiled into global community databases curated by scientific experts. For their study, Mottl, Flantua et al. 2021 used 1181 fossil pollen records covering all continents except Antarctica. Each point (blue) represents one fossil pollen record. Many coring and drilling methods exist to obtain fossil pollen records from lakes, wetlands, bogs, and other environments. Note: Photos [c-h,n,p-r] display sites not included in the study. Photo credits: John W. Williams (a,b,r); Steffen Wolters (c,d,f,g); Thomas Giesecke (e); Henry H. Hooghiemstra (h,l,n,o,q); Geoff Hope (i); Feli Hopf (j); Eric Colhoun (k), Sarah Ivory (m), Luciane Fontana (p).
The fossil pollen dataset guide consists of a workflow called FOSSILPOL, an R-package (RFossilpol) and a website. The FOSSILPOL workflow handles most of the processing steps (related to depositional environments, chronologies, filtering and taxonomic harmonization, Figure 3), while requiring input from the user at certain steps. All criteria and configurations are defined in one main configuration file, and several R packages are used throughout the workflow. The final outputs of the workflow include a standardized compilation of taxonomically harmonized fossil pollen data, plots of modelled age-depth curves and a pollen diagram for each record, and several overview figures and maps.
"It can also be seen as a tool that allows full reproducibility of major data analyses as all decisions throughout the data processing steps are transparent, well documented, and easily reported in studies. In addition, the data compilation is dynamic and will change together with the open access databases as increasingly more data becomes available. This is all thanks to constant data sharing within and between research communities, which we are really excited about”, agree Felde and Bhatta.
This paper is a contribution to the Humans on Planet Earth - Long-term impacts on biosphere dynamics (HOPE) project at The Department of Biological Sciences, UiB. The HOPE project is funded by a European Research Council Advanced Grant and the project’s goal is to addresses a critical question in Earth system science - what was the impact of prehistoric people on the biosphere and its dynamics?
Suzette G.A. Flantua is currently funded by the Trond Mohn Stiftelse and the University of Bergen start-up grant TMS 2022STG03, an interdisciplinary project aimed to assess the long-term dynamics of alpine systems worldwide using tools from geography, biology, paleoecology, and climatology https://mountainsinmotion.w.uib.no/