Metabolon Logo
Metabolon Logo

From Spectra to Systems: Elevating Multiomics with Accurate Metabolite Identification 

Elevating Multi-Omics with Precise Metabolite Identification

The Integral Role of Metabolomics in Multiomics Research 

Untargeted metabolomics is a pivotal approach in systems biology that analyzes the vast array of metabolites present in biological systems on a large scale. Metabolites are the essential building blocks for growth and repair in living organisms. Because metabolomics closely tracks changes in an organism’s observable characteristics (phenotypes), it offers a comprehensive view of the organism’s physiological status. This makes metabolomics an essential tool for understanding how genetic variation, environmental conditions, and lifestyle factors directly impact living organisms1,2.

Multiomics research merges datasets from various omics layers to gain comprehensive biological insights. Metabolomics is pivotal in this integration as it dynamically captures biochemical changes. By combining metabolomics with other omics datasets, metabolomics can provide direct insights into how molecular changes manifest as observable traits, facilitating the linkage between genotypic variations and phenotypic outcomes3,4.  

The Importance of Accurate Metabolite Identification in Multiomics Research 

The promise of untargeted metabolomics lies in its ability to detect a broad array of metabolites, ranging from amino acids and lipids to nucleotides and secondary metabolites. Liquid chromatography-mass spectrometry (LC/MS) is one of the most widely adopted techniques in untargeted metabolomics due to its ability to measure a diverse range of compounds. However, LC/MS also brings its own set of challenges. A single chemical entity can produce multiple peaks in LC/MS measurement due to the presence of adducts, isomers, and fragments. Accurately identifying which peaks correspond to largely unknown chemical entities and distinguishing them from noise is an open research challenge5,6.   

The accuracy of metabolite identification significantly influences the depth and reliability of multi-omics research. Errors in identification compromise the effectiveness of computational methods across all categories, affecting the discovery of biomarkers, understanding of disease mechanisms, and identification of therapeutic targets in multiomic studies. At Metabolon, we have compiled an extensive library covering over 5,400 compounds. This extensive coverage greatly improves our ability to achieve Level 1 identification, accurately matching metabolites to known chemical standards based on m/z, retention time, and spectral data. This level of accuracy supports actionable discoveries from multiomics datasets.  

Multiomics Pathway Analysis 

Pathway analysis tools using Reactome7 and KEGG8 rely heavily on accurate metabolite identification to correctly map metabolomics data onto biological pathways. Without accurate identification, key metabolites may be incorrectly assigned or missed entirely, leading to a flawed understanding of the interactions between metabolic pathways and  other omics layers. This misrepresentation can skew the understanding of biological functions and disease mechanisms. 

Latent Factor Methods 

Latent factor approaches, such as MOFA9, DIABLO10, and MCFA11, depend on the accurate identification of metabolites to uncover the underlying factors that drive variations across omics datasets. Inaccurate metabolite identification introduces noise, reducing the method’s ability to correlate metabolic profiles with genetic or proteomic alterations. This could compromise the utility of these methods in identifying potential therapeutic targets. 

Clustering-Based Methods 

Clustering methods, including iCluster/iCluster+12,13 and moCluster14, utilize accurately identified metabolites to elucidate clusters across multiomic datasets. When metabolite identification is inaccurate, the clustering algorithms may group unrelated entities together or separate entities that should be clustered. This misclassification can lead to incorrect assumptions about the relationships between different omics layers. 

Network-Based Methods 

Network-based integration methods like SNF15 and MOGONET16 construct networks that represent biological relationships among various datasets. Accurate metabolite identification is crucial for these methods to create meaningful connections. Inaccurate identification can lead to the formation of spurious links, distorting the biological network and potentially leading to incorrect conclusions from the study. 

Bayesian Methods 

Bayesian approaches, such as BCC17 and MDI18, analyze integrated multi-omics datasets to identify complex biological relationships. The accuracy of metabolite identification directly affects these methods’ ability to infer probabilistic relationships between datasets. Inaccurate identification can introduce bias, leading to unreliable biomarker discovery and therapeutic target identification.  


Without the assurance of high-confidence identifications at Level 1, the potential of multi-omics integration tools remains underutilized. This constrains our ability to derive meaningful insights from multiomics studies. Accurate metabolite identification at the highest confidence level is necessary to navigate the complexities of multiple omics measurements.  

Metabolon’s vast metabolite library is a critical asset for achieving Level 1 metabolite identification. This extensive coverage not only sets a high standard for metabolomics, but also supports the success of multiomics studies by providing a solid foundation for data integration. Achieving high-confidence metabolite identification strengthens the ties between metabolomics and the broader multiomics research ecosystem, allowing advanced computational methods to fully benefit from the insights that metabolomics brings. 


  1. Suhre K and Gieger C. Genetic variation in metabolic phenotypes: study designs and applications. Nat Rev Genet. 2012;13(11): 759-769. doi:10.1038/nrg3314 
  2. Chen L, Zhernakova DV, Kurilshikov A, et al. Influence of the microbiome, diet and genetics on inter-individual variation in the human plasma metabolome. Nature Med. 2022; 28(11): 2333-2343. doi: 10.1038/s41591-022-02014-8  
  3. Hasin, Yehudit, Marcus Seldin, and Aldons Lusis. “Multi-omics approaches to disease.” Genome biology 18 (2017): 1-15. 
  4. Karczewski KJ and Synder MP. Integrative omics for health and disease. Nature Rev Genet. 2018;19(5): 299-310. doi:10.1038/nrg.2018.4  
  5. Blaženović I, Kind T, Ji J, et al. Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites. 2018;8(2): 31. doi: 10.3390/metabo8020031  
  6. Dunn WB, Erban A, Weber RJM, et al. Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics. 2013;9: 44-66. Doi: 10.1007/s11306-012-0434-4   
  7. Fabregat A, Jupe S, Matthews L et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46(D1): D649-D655. doi: 10.1093/nar/gkx1132   
  8. Kanehisa M and Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1): 27-30. doi: 10.1093/nar/28.1.27  
  9. Argelaguet R, Arnol D, Bredikhin D, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020; 21(1):111. doi: 10.1186/s13059-020-02015-1 
  10. Singh A, Shannon CP, Gautier B, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics. 2019;35(17): 3055-3062. doi: 10.1093/bioinformatics/bty1054 
  11. Brown BC, Wang C, Kasela S, et al. Multiset correlation and factor analysis enables exploration of multi-omics data. Cell Genom. 2023;3(8):100359. doi: 10.1016/j.xgen.2023.100359 
  12. Shen R, Oshen AB, and Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25(22): 2906-2912. doi:10.1093/bioinformatics/btp543  
  13. Mo Q, Wang S, Seshan VE, et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci USA. 2023;110(11): 4245-4250. doi: 10.1073/pnas.1208949110 
  14. Meng C, Helm D, Frejno M, et al. moCluster: identifying joint patterns across multiple omics data sets. J Proteome Res. 2016;15(3): 755-765. doi: 10.1021/acs.jproteome.5b00824   
  15. Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3): 333-337. doi: 10.1038/nmeth.2810  
  16. Wang T, Shao W, Huang Z, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12(1): 3445. doi: 10.1038/s41467-021-23774-w   
  17. Mo Q, Shen R, Guo C, et al. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics. 2018;19(1): 71-86. doi: 10.1093/biostatistics/kxx017 
  18. Kirk P, Griffin JE, Savage RS, et al. Bayesian correlated clustering to integrate multiple datasets.” Bioinformatics. 2012;28(24): 3290-3297. Doi: 10.1093/bioinformatics/bts595  
Joe Wandy, Ph. D.
Joe Wandy is a Principal Bioinformatician at Metabolon. His research interest lies in the development of computational methods to pre-process metabolomics data and integrating metabolomics with various omics modalities. He completed his Ph.D. in Computer Science at the University of Glasgow, United Kingdom. At Metabolon, Joe is working on developing Metabolon’s Bioinformatics Platform, as well as contributing to the broader understanding of complex biological systems through multi-omics integration.


Share this article

Contact Us

Talk with an expert

Request a quote for our services, get more information on sample types and handling procedures, request a letter of support, or submit a question about how metabolomics can advance your research.

Corporate Headquarters

617 Davis Drive, Suite 100
Morrisville, NC 27560

+1 (919) 572-1721