A critical analysis of Adaptive Box-Cox transformation for skewed distributed data management: Metabolomics of Spanish and Argentinian truffles as a case study

Sibono, Leonardo;Grosso, Massimiliano
;
Casula, Mattia;Manis, Cristina;Caboni, Pierluigi
2025-01-01

Abstract

Background: Metabolic variations retrieved in metabolomic data are considered a benchmark for detecting biomatrix variability. Therefore, identifying target metabolites is crucial to keep track of any substrate modification and preserve it from any undesired alteration. Unfortunately, such a task can be negatively affected by detecting false positives, often triggered by complicated data distributions. In this work, we undertook an investigation of the metabolic profile of Spanish and Argentine truffles using a robust methodology. The issue of skewed data distributions has been effectively addressed through a normalization preprocessing, enhancing biomarker identification and samples classification. Results: A data normality-improved parametric test (ANOVA) was employed to define the target metabolites, which significantly vary between two regions of origin: Spain and Argentina. Specifically, Adaptive Box-Cox transformation was employed to improve the ANOVA test's performance so that data distributions were fitted to a Gaussian variable. Using the Bonferroni-Holm method for false discovery rate correction, we demonstrated the effectiveness of this transformation for the case under investigation. Results were compared with two non-parametric tests (Kruskall-Wallis and Permutation test), selected as a reference methodology, to provide a better understanding of non-normal distributions often encountered in metabolomic data analysis. 17 metabolites out of the 57 investigated metabolites exhibited notable variability across the two geographical regions. The validity of this methodology was supported through the discrimination of samples belonging to different groups. In this regard, both univariate and multivariate statistical models were tested through Monte Carlo simulations and yielded consistent results. Significance: Data analysis outcomes are sensitive to variables distributions. The present study shows an effective tool to increase data normality, thereby enhancing the statistical power for biomarker discovery and improving models’ classification performances. These results find justification from the current knowledge within the field of food sciences, enabling their application in advancing research in the truffle analysis domain.
2025
2025
Inglese
1345
343704
11
https://www.sciencedirect.com/science/article/pii/S0003267025000984?via=ihub
Esperti anonimi
internazionale
scientifica
Biomarker discovery
Data preprocessing
Food
Geographical origin
Mass spectrometry
Metabolomics
Goal 3: Good health and well-being
Sibono, Leonardo; Grosso, Massimiliano; Tejedor-Calvo, Eva; Casula, Mattia; Marco-Montori, Pedro; Garcia-Barreda, Sergi; Manis, Cristina; Caboni, Pier ...espandi
1.1 Articolo in rivista
info:eu-repo/semantics/article
1 Contributo su Rivista::1.1 Articolo in rivista
262
8
open
Files in This Item:
File Size Format  
SibonoetalACA_2025.pdf

open access

Description: Articolo principale
Type: versione editoriale
Size 2.67 MB
Format Adobe PDF
2.67 MB Adobe PDF View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie