Why Metabolon?

Metabolon: The Gold Standard for Actionable Metabolomics

The bottleneck for most metabolomics practitioners continues to be accurate metabolite identification from raw data.

Metabolomic service providers universally claim that their platforms can identify thousands (or hundreds of thousands) of “biomarkers” or “features” in their respective deliverables. However, a significant portion of those metabolites are “identified based on minimal or even absent criteria for accurate identity”, resulting in incorrect metabolite identification and subsequent erroneous interpretation of experimental data.

Here we will cover:

  • Define key terms used by metabolomic service providers (both academic and commercial).
  • Highlight the major differences between terminologies (e.g., features, biomarkers, annotation/identifications).
  • Describe the processes required for accurate metabolite identification.
  • Discuss the importance of the established industry standard annotation confidence levels used to minimize or eliminate variability in annotation methods across platforms, improving the accuracy of metabolite identification.
  • Demonstrate why highly accurate metabolite identification is critical for highly confident biological conclusions to be drawn from metabolomics datasets and the applicability of those insights in drug discovery.

The Challenge in Metabolomics

Metabolomics is a rapidly advancing field that enables mechanistic, functional, and actionable insights into the complexity of biological systems.1,2 Indeed, several clinical tests for monitoring human health are based on small molecules or metabolites, such as glucose, cholesterol, creatinine, and others. By leveraging the power of metabolomics, researchers can uncover invaluable insights into the underlying mechanisms of disease and develop more precise phenotypic fingerprints to assess individual disease risk.2

Nevertheless, accurate metabolite annotation (i.e., identification), and the use of appropriate bioinformatics tools to support accurate identification, remains a significant challenge for many in the field. The crucial step of converting raw mass spectrometry (MS) data into accurate metabolite identification underpins the success of any scientific question asked. Without accurate identifications, based on rigorous and evidence-based metrics, only inaccurate conclusions can be drawn.

Several factors contribute to the challenges associated with accurate metabolite identification from MS data such as the inherent hardware performance limitations, speed of data acquisition, and cost, regardless of service provider. Faster and less expensive methods will likely provide fewer metabolites with lower accuracy. If the goal of a metabolomics study is to achieve meaningful scientific insight, metabolite identification must be accurate to facilitate actionable data interpretation.

Understanding the challenges to achieving such accuracy is paramount for researchers seeking to unlock the full potential of metabolomics and drive breakthroughs in disease research. At Metabolon, we have spent over 20 years optimizing metabolite identification, driving greater process efficiencies with automation, and incorporating machine learning into metabolite identification, ensuring the highest quality and most accurate data generation on the market.

Metabolomics Platforms

Analytical platforms such as mass spectrometry and nuclear magnetic resonance (NMR) have emerged as the dominant platforms enabling the robust detection of numerous metabolites. The most comprehensive metabolite coverage is typically accomplished by leveraging a non-targeted approach using high-performance liquid chromatography (HPLC) coupled with high-resolution accurate mass (HRAM) mass spectrometry, commonly referred to as HPLC-MS, or LC-MS. LC-MS is capable of routine detection of thousands of unique chemical compounds, providing broad coverage of biologically relevant metabolites.3

A typical non-targeted LC-MS metabolomics pipeline requires stringent processing of raw data performed over a series of steps:

  1. Feature extraction (includes parent/molecular ion, isotopes, and adduct detection)
  2. Feature retention time peak alignment
  3. Area under the curve for each feature
  4. Feature annotation/identification
  5. Statistical analysis
  6. Data interpretation

A typical dataset contains up to tens of thousands of features (mass-to-charge and retention time (m/z-RT) pairs). These correspond to a complex mixture of unidentified adducts,4,5 in-source fragments, multimers, true molecular ions and a vast assortment of isotopes only differentiable by retention time.6 Although strategies begin to diverge in Step 6, these non-molecular ion features are consistently the source of many misidentifications.

Metabolite Features vs. Annotation

While it is difficult to derive biological insight from unidentified metabolite features, most metabolomic service providers put a great deal of emphasis on the number of features they can detect, regardless of whether those features can be identified or not. But the number of features detected/reported per metabolite isn’t a great measure of value given that not all detected features correspond to the parent ion (molecular ion) of a known metabolite. Many features represent artifacts of the methodology used and include molecular ionization artifacts such as adducts, in-source fragments, chimeras, multimers, and background noise (Figure 1).

Users typically acquire a list of these artifacts along with all of the other detected features at the end of Step 3 with the possible assumption that all features, including the artifacts, represent parent ions and thus they will attempt to identify them. The number of misidentifications in this typical workflow became so prevalent due to these assumptions that the metabolomics community established metrics on feature characteristics that represent an identification of sufficient rigor to be used universally in metabolite identification.7

Metabolon Library vs Others

Figure 1. An LC-MS spectrum of glucose, demonstrating the parent ion and in-source fragments along with chlorine and formate adducts. All of these constitute “features”.

Annotation refers to the process of assigning a specific identity or name to a detected feature. A list of many features can correspond to only one metabolite (Figure 1). This is a critical step in metabolomic analysis that enables researchers to accurately identify the metabolites present in their samples and advance their studies.

To distinguish between confidence levels in annotation accuracy, the metabolomics standards initiative (MSI) in 2007 proposed four identification levels,7 which were later expanded to five levels (Figure 2).8 Within each level, the approach to annotating metabolites is different and depends on the level of confidence required by the established standards.

metabolite identification levels

Figure 2. The identification levels that were established to provide confidence in annotations. Level 1 has the highest stringency requiring m/z, RT, and MS/MS of a measured feature against an authentic chemical standard.

At Metabolon,

we offer the largest Level 1 library in the metabolomics industry.

The Process of Annotation

Annotation of metabolite candidates is a time-consuming process based on matching masses with all authenticated standards performed in-house, or readily available in external databases, and then supporting those matches with other characteristics, such as retention time (RT). Tandem mass spectrometry (MS2 or MSn) can be used to acquire additional structural information that can confirm metabolite identity. Extensive, readily available and accessible molecular structural databases (e.g., Pubchem,9 HMDB,10 KEGG,11 ChemSpider12) and fragmentation spectral databases (e.g., Metlin,13 GNPS,14 MassBank,15 and NIST16) can further aid in metabolite identification; however, the metabolite fragmentation spectra are highly instrument and setting specific, so finding exact spectral matches against these existing databases can be a challenge.

To help index these databases at scale, software annotation tools (e.g., XCMS,17 GNPS,14 SIRIUS,18 and MS-DIAL19) have been developed and integrated into computational metabolomic workflows. However, major issues arise when using these annotation tools, as most of these algorithms require multiple adjustable thresholds and cut-off parameters that can vary significantly among users.20 Consequently, vastly different peak lists can be generated from a single raw data set depending on the algorithm and parameters used. These issues are detrimental, particularly for large datasets where a lack of reproducibility can have a profound impact on interpretability.21 Additionally, none of these databases utilize RT as part of the annotation process, a particular problem given the paucity of structural isomers with the same molecular formula, mass, and fragmentation characteristics. Notable examples include the family of hydroxy-butyrates and hexoses, among many others that cannot be accurately identified without the use of retention time. Results generated without RT are subject to significantly higher misidentification rates.

What is the Gold Standard?

Currently, the gold standard for metabolite annotation is to match at least two physiochemical properties (e.g., accurate m/z value, RT, and MS/MS) of a measured feature against authentic chemical standards measured using the same LC-MS platform, data acquisition parameters and sample preparation protocols (e.g., chromatographic methods, ionization modes, and collision energies).7,22,23 This type of annotation is designated Level 1 and permits biological interpretation with a high degree of confidence.

What Differentiates Metabolon From The Rest?

At Metabolon, we offer the largest Level 1 library in the metabolomics industry. Our proprietary library has been built and curated over 20 years and contains over 5,400 entries. The vast majority of entries in our library are Level 1 attributing approximately 85% (~4,600 entries); however, some are Level 2 (approximately 15% accounting for around 800 entries) due to a lack of commercial standards available to qualify for Level 1. The combination of 1) the highest level of confidence in annotation and 2) the breadth of our library is unmatched, enabling accurate actionable insights for our clients’ scientific or clinical inquiries.

Other providers claim that they can scan a large number of small molecule biomarkers or have a curated database of greater than 200,000 features. This means you may get a feature list of m/z-RT pairs with no annotations, or that the service providers’ algorithms have been developed to leverage well-known databases such as HMDB, GNPS, or KEGG. Other service providers sacrifice compound specificity and resolution for speed of acquisition. The result in each of these cases is that metabolomic spectral measurements are the sum of multiple compounds; even so, some service providers will specify a single annotation from a measurement that is the sum total of five different isomers. Using the glucose example from Figure 1, Metabolon will report the annotated metabolite, however other providers will provide the incorrect assignment to features as seen in Figure 3.

Metabolon's metabolite identification illustrated and compared to other metabolomics providers

Figure 3. A comparison of how features and metabolites are reported using an LC-MS spectrum of glucose. Metabolon will report only an annotated metabolite that has been matched with an authentic standard. Other service providers claim all signals are unique compounds and will report many incorrectly.

Although some of these practices may provide sufficient metabolomic resolution for discovery-based work, there are major limitations regarding reproducibility, and they should be approached with caution considering confidence in biological interpretation. Metabolon’s platform provides by far the most accurate metabolite identification directly tied to corresponding standards, ensuring the highest quality of data reported to clients.

What Does This Mean For The Future?

Why are Level 1 annotations important? The FDA recently changed the face of drug development by removing the requirement for preclinical testing on non-relevant animal models. Increasingly, drug developers are moving towards building a data package containing biomarkers supporting patient selection and best efficacy indicators. There is considerable evidence that suggests trials that use biomarkers and patient selections have higher overall success than trials without biomarkers.24

Metabolon has a strong track record in supporting pharmaceutical and biopharma companies’ key decisions through the process from discovery to clinical trials.25 Having executed more than 10, 000 projects over the last 20 years, Metabolon is able to leverage its reproducible and rigorous data in a regulatory setting, while providing a singular focus on understanding the client’s needs and delivering support for the success of their programs. Past and current customers include those at pharmaceutical/biotech companies, academics, and applied market segments.

In summary, Metabolon has the largest library of Level 1 annotations built off retention time and compound resolution essential for enabling meaningful, actionable biological insights. Adhering to industry standards, our library minimizes false positives, ensures scientific and clinical accuracy in biomarker discovery and supports precision medicine applications. With Metabolon, you can be confident that your results will have the maximum possible number of Level 1 annotations.


Adduct: An adduct is a molecule formed by the combination of two or more molecules, typically through a chemical reaction. In LC-MS, each peak in the raw data may represent an ion, adduct, fragment, or isotope of a metabolite, and one metabolite may be represented by several peaks. Adduct annotation is particularly challenging, as the same mass difference between peaks can arise from adduct formation, fragmentation, or isotope labeling.

Chromatographs: Analytical instruments used to separate and identify metabolites based on their physical and chemical properties.

Compound: A molecule present in a biological sample that can be detected and quantified using analytical techniques such as mass spectrometry and NMR spectroscopy.

In-source fragments: Metabolite fragments that are generated during electrospray ionization (ESI) in LC-MS analysis. In-source fragments (ISFs) can pose a challenge in metabolomics MS experiments because one metabolite can produce multiple features, each corresponding to a different fragment. ISFs can lead to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanism.

Isotope: Atoms of the same element that have the same number of protons in the nucleus but different numbers of neutrons and therefore, the same atomic number but a different mass number. They have nearly identical chemical behavior but different physical properties.

Isotopologues: Molecules that differ in their isotope composition, specifically the number and position of isotopes.

Library: A collection of spectral data and associated metadata for known metabolites.

Metabolite: A small molecule substrate, intermediate, or product of metabolism that can be detected and quantified using analytical techniques such as mass spectrometry and NMR spectroscopy.

Multimers: Multiple copies of a single metabolite, which can complicate metabolite identification and quantification. Multimers can be formed during sample preparation or during the ionization process in mass spectrometry-based metabolomics.

Peak detection: Peak detection is a key step in preprocessing untargeted metabolomics data generated from high-resolution (LC-MS). It is the process of identifying individual peaks in the retention time dimension of the LC-MS data. These peaks are referred to as chromatographic peaks to distinguish them from mass peaks (i.e., peaks within a spectrum along the mz dimension).

Peak alignment: The process of aligning peaks across multiple samples to account for subtle shifts in retention time, a crucial preprocessing step in metabolomics.

Retention time: Retention time is a key parameter in LC-MS-based metabolomics that refers to the time it takes for a metabolite to travel through and elute from a chromatography column. Retention time is used in metabolite annotation and identification.

Retention time correction: A method used in LC-MS-based metabolomics to correct for variations in retention time across different samples. Retention time correction is important because small variations in retention time can lead to incorrect metabolite identification and quantification.

Spectrum: Refers to the collection of data obtained from a mass spectrometer or nuclear magnetic NMR spectrometer.


1. Liu, X and. and Locasale, J. W. Metabolomics: A Primer. Trends Biochem. Sci 2017;. 42(4):, 274–284 (2017).

2. Zhang, A., Sun, H., Wang, P., Han, Y. and Wang, X.et al. Modern analytical techniques in metabolomics analysis. Analyst 2012;137(2):, 293–300 (2012).

3. Kind, T, Tsugawa H, Cajka T et al.. et al. Identification of small molecules using accurate mass MS/MS search. Mass Spectrom. Rev 2018;. 37(4):, 513–532. (2018).

4. Alonso, A., Marsal, S. andand Julià, A. Analytical Methods in Untargeted Metabolomics: State of the Art in 2015. Front. Bioeng. Biotechnol 2015;. 3:23., (2015).

5. Li, Z, Lu Y, Guo Y et al.. et al. Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection. Anal. Chim. Acta 2018; 1029:, 50–57 (2018).

6. Graça, G, Cai Y, Lau CHE et al.. et al. Automated Annotation of Untargeted All-Ion Fragmentation LC–MS Metabolomics Data with MetaboAnnotatoR. Anal. Chem 2022;. 94(8):, 3446–3455 (2022).

7. Sumner, L. W, Amberg. Barrett D et al. Proposed minimum reporting standards for chemical analysis. Metabolomics 2007;3(3):211–221.

8. Schymanski EL, Jeon J, Gulde R et al. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environ Sci Technol 2014;48(4):2097–2098.

9. Kim S, Chen J, Cheng T et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 2019;47(D1):D1102–D1109.

10. Wishart DS, Feunang YD, Marcu A et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res 2018;46(D1):D608–D617.

11. Kanehisa M, Sato Y, Kawashima M et al. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 2016;44(D1):D457–D462.

12. Pence He and Williams A. ChemSpider: An Online Chemical Information Resource. J Chem Educ 2010;87(11):1123–1124.

13. Xue J, Guijas C, Benton HP et al. METLIN MS2 molecular standards database: a broad chemical and biological resource. Nat Methods 2020;17(10):953–954.

14. Wang M Carver JJ, Phelan V et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol 2016;34(8):828–837.

15. Horai H, Arita M, Kanaya S et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 2010;45(7):703–14.

16. The NIST Mass Spectrometry Data Center. NIST Standard Reference Database 1A. 2014. https://www.nist.gov/system/files/documents/srd/NIST1aVer22Man.pdf; Accessed June 28, 2023.

17. Forsberg EM, Huan T, Rinehart D et al. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online. Nat Protoc 2018;13(4):633–651.

18. Dührkop K, Fleischauer M, Ludwig M et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods 2019;16(4):299–302.

19. Tsugawa H, Nakabayashi R, Mori T et al. A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms. Nat Methods 2019;16(4):295–298.

20. Domingo-Almenara X Montenegro-Burke JF, Benton HP et al. Annotation: a computational solution for streamlining metabolomics analysis. Anal Chem 2018;90(1):480–489.

21. Zulfiqar Z, Gadelha L, Steinbeck C et al. MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry. J Cheminformatics 2023;15(1):32.

22. Broeckling CD, Ganna A, Layer M et al. Enabling Efficient and Confident Annotation of LC−MS Metabolomics Data through MS1 Spectrum and Time Prediction. Anal Chem 2016;88(18):9226–9234.

23. Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC–MS global metabolomics data. Brief Bioinform 2023;24(1):bbac553.

24. Wong CH, Siah KW, and Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics 2019;20(2):273–286.

25. https://www.metabolon.com/applications/biopharma/

Contact Us

Talk with an expert

Request a quote for our services, get more information on sample types and handling procedures, request a letter of support, or submit a question about how metabolomics can advance your research.

Corporate Headquarters

617 Davis Drive, Suite 100
Morrisville, NC 27560

Mailing Address:
P.O. Box 110407
Research Triangle Park, NC 27709

+1 (919) 572-1711

+1 (919) 572-1721

International Headquarters

Metabolon GmbH

Zeppelinstraße 3
85399 Hallbergmoos