- Home
- Publications
- PAGES magazines
- PAGES Magazine
- Building a controlled vocabulary for the international lipid biomarker database
Building a controlled vocabulary for the international lipid biomarker database
Harleena Franklin, E.K. Thomas, J.W. Williams, J.M. Aguilar, I.S. Castañeda, K.H. Freeman, N. McKay and C. Morrill
Past Global Changes Magazine
31(2)
129
2023
Harleena Franklin
1,
E.K. Thomas
1,
J.W. Williams
2,
J.M. Aguilar3,
I.S. Castañeda
4,
K.H. Freeman5,
N. McKay
6 and C. Morrill7





Achieving global-scale insights into past climate variations requires the careful assembly and standardization of networks of proxy databases (Kaufman et al. 2020; Konecky et al. 2020; Walter et al. 2023). Moreover, it is expected that scientific data are openly and readily shared online. These expectations were formalized through the FAIR Guiding Principles (Wilkinson et al. 2016), which created a standard framework that open scientific data should be findable, accessible, interoperable, and reusable.
Controlled vocabularies are essential infrastructure to meet the FAIR principles, thereby enabling global-scale data syntheses and subsequent scientific research. Controlled vocabularies are sets of terms constrained by specific rules that allow for concise and unambiguous usage (Wojcik 2006). Several community-led controlled vocabularies are emerging in paleoclimatology and paleoecology, including the PaST Thesaurus (ncei.noaa.gov/products/paleoclimatology/paleoenvironmental-standard-terms-thesaurus) employed by the NOAA World Data Service for Paleoclimatology (Morrill et al. 2021) and the steward-curated taxonomies used by the Neotoma Paleoecology Database (Williams et al. 2018). As the volume and variety of empirical data in paleoclimate research expands, controlled vocabularies developed by experts and consistently shared paleodatabases become ever more essential.
Lipid biomarkers are common in climate and environmental studies, especially in the near-recent times, and represent readily analyzed lipids that have homologous series distributions, ratios, and isotope abundances with high utility for the paleoclimate community. Despite this, these have no comprehensive controlled vocabulary for paleoclimate and environmental use, although the International Union of Pure and Applied Chemistry (IUPAC) dictionary (goldbook.iupac.org) exists for many compounds. Here, we present a draft controlled vocabulary that encompasses several major classes of lipid biomarkers commonly applied for paleoclimate research. To facilitate interoperability among data resources, the NOAA World Data Service, LiPDverse, and Neotoma have all agreed to adopt this vocabulary. This vocabulary is being developed as an open process, and we welcome community input.
Because the task of cataloging and establishing vocabulary rules for thousands of lipid biomarkers is non-trivial, we have begun with some of the most commonly used lipid biomarkers in paleoclimate research: branched and isoprenoidal glycerol dialkyl glycerol tetraethers, n-alkanoic acids, n-alkanes, alkenones, and long-chain diols. We have developed a list of lipid biomarker names as they are commonly used in the paleoclimate literature, and include the IUPAC term for each compound, to avoid ambiguity. This list is published as v 0.1.0 on Google Sheets (tinyurl.com/ILBD010) and is available for comment. We are seeking community input to check for completeness and accuracy within these classes by 31 January 2024. We would also welcome participation by individuals or teams interested in leading development of a list for other classes of lipid biomarkers.
When the community input period is complete, we will update and publish v 1.0.0 of the International Lipid Biomarker Controlled Vocabulary on Zenodo, with updates and future versions possible afterwards. We will also incorporate v 1.0.0 and subsequent versions into the controlled vocabularies maintained by NOAA, LiPDverse, and Neotoma. If there are other databases interested in using this controlled vocabulary, please contact us. With a controlled vocabulary in place, the next steps will be to harmonize the vocabulary in lipid biomarker datasets currently on public paleoclimate databases, and gather and add datasets not yet on these public databases. Anyone interested in contributing vocabulary or datasets can contact Harleena Franklin at harleena@buffalo.edu. Documentation of the process being developed here may be useful to experts seeking to develop controlled vocabularies for other proxies.
affiliationS
1Department of Geology, University at Buffalo, USA
2Department of Geography, University of Wisconsin-Madison, USA
3Department of Chemistry, University at Buffalo, USA
4Department of Earth, Geographic, and Climate Sciences, University of Massachusetts Amherst, USA
5Department of Geosciences, Penn State University, University Park, USA
6School of Earth and Sustainability, Northern Arizona University, Flagstaff, USA
7Climatic Science and Services Division, NOAA's National Centers for Environmental Information, Washington DC, USA
contact
Harleena Franklin: harleena@buffalo.edu
REFERENCES
Kaufman D et al. (2020) Sci Data 7: 115
Konecky BL et al. (2020) Earth System Science Data 12: 2261-2288
Morrill C et al. (2021) Paleoceanogr Paleoclimatol 36
Walter RM et al. (2023) Earth Syst Sci Data 15: 2081-2116
Wilkinson MD et al. (2016) Sci Data 3: 160018